Reading Seminar

July 24, 2024, presented by Intae |
ProPILE: Probing Privacy Leakage in Large Language Models (NeurIPS, 2024)
(Discussion) ProPILE, a novel probing tool designed to assess the risk of Personally Identifiable Information (PII) leakage in Large Language Models (LLMs), demonstrates the feasibility of extracting PII through strategic prompt engineering. By utilizing black-box probing for data subjects and white-box probing for LLM service providers, ProPILE enables the assessment of PII leakage in models like OPT-1.3B trained on the Pile dataset. Experiments reveal that the likelihood of target PII generation increases with more specific prompts or additional linked PII details, while white-box probing with access to a limited subset of training data significantly amplifies this leakage potential. These findings underscore the effectiveness of ProPILE as an assessment tool and highlight the need for ongoing research and robust mitigation strategies to address privacy vulnerabilities in LLMs.

July 16, 2024, presented by Minseok |
Improving Real-world Password Guessing Attacks via Bi-directional Transformers (USENIX, 2023)
(Discussion) This paper proposes a bi-directional transformer-based guessing framework, referred to as PassBERT, which applies the pre-training/fine-tuning paradigm to password guessing attacks. First, a model pre-trained on the general password distribution was prepared, which was then fine-tuned on three specifically designed attack approaches. These methods reflect real-world attack scenarios and include: 1) conditional password guessing, which recovers the complete password given a partial one; 2) targeted password guessing, which compromises the password of a specific user using personal information; and 3) adaptive rule-based password guessing, which selects rules to generate rule-transformed password candidates. The experimental results show that the fine-tuned models can outperform state-of-the-art models by 14.53%, 21.82%, and 4.86% in the three attacks, respectively, demonstrating the effectiveness of bi-directional transformers on downstream guessing attacks. Furthermore, a hybrid password strength meter was proposed to mitigate the risks from these three types of attacks.

July 12, 2024, presented by Shakhzod |
Semantic Ranking for Automated Adversarial Technique Annotation in Security Text (ASIA CCS, 2024)
(Discussion) This paper introduces an innovative multi-stage ranking system for extracting and annotating adversarial techniques from threat intelligence reports, leveraging language models fine-tuned for cybersecurity tasks. The system demonstrates significant improvements in accuracy and recall compared to previous methods, with enhanced performance on verbose datasets like MITRE ATT&CK Reports and WeLiveSecurity. The comprehensive approach, including testing across various datasets and the introduction of a public dataset with 6.6K annotations, strengthens the study's contributions to the field. However, the observed performance disparity across datasets highlights the need for further refinement to handle concise descriptions more effectively. Overall, the study advances automated threat technique annotation and provides a solid foundation for future cybersecurity research and development.

July 4, 2024, presented by Suyeon |
LogBERT: Log Anomaly Detection via BERT (IJCNN, 2021)
(Discussion) Detecting anomalies within system logs is paramount to protecting the system from an attack or malfunction. Traditional methods employ regular expressions or machine learning models to identify anomalous events. These approaches depend on handcrafted features and are unable to capture temporal information. For example, malicious logs could be benign on their own, but the collection of these logs could be malicious. Recently, deep learning models such as recurrent neural networks (RNNs) have been widely used to evaluate these sequences and can capture temporal information. However, RNNs cannot encode contextual information in a bi-directional manner. This paper proposes a log anomaly detection model based on BERT. BERT can capture contextual information in a bi-directional manner making it suitable for this task. The model is evaluated on three datasets: Hadoop Distributed File System (HDFS), BlueGene/L Supercomputer System (BGL), and Thunderbird-mini dataset. The baseline models are Principal Component Analysis (PCA), One-Class SVM (OCSVM), IsolationForest (iForest), LogCluster, DeepLog, and LogAnomaly. Their proposed model demonstrates superior performance over all baseline state-of-the-art models in all datasets. However, the detailed process in training is not described. For example, BERT splits the training step into a pretraining and finetuning phase. However, LogBERT seems to only discuss the pretraining phase and have no mention about the fine tuning phase. As a result, it is not clear if the fine tuning phase is excluded or it is not mentioned.

June 24, 2024, presented by Jiyong |
Universal and Transferable Adversarial Attacks on Aligned Language Models (arxiv, 2023)
(Discussion) This paper proposes an attack method that causes aligned language models to generate undesirable behaviors. Aligned language models refuse to respond to harmful queries. To enable these language models to respond to harmful queries, this paper attaches a suffix to the query. This approach makes the LLM generate a positive response instead of refusing to answer. Positive responses to harmful queries were obtained from ChatGPT, Bard, Claude, LLaMA-2-Chat, Pythia, and Falcon, with a much higher success rate for GPT-based models. The evaluation section only demonstrates the high success rate of the proposed attack method. It would have been better if the paper included the limitations of this study and a model ablation study section.

June 17, 2024, presented by Yujeong |
Membership Inference via Backdooring (IJCAI, 2022)
(Discussion) The paper titled 'Membership Inference via Backdooring' presents a novel approach to addressing data privacy concerns in machine learning by introducing a method called Membership Inference via Backdooring (MIB). The main contribution of MIB lies in its ability to mark a small number of data samples, which, when used by an unauthorized party to train a model, allows the data owner to later identify this misuse through black-box queries and statistical hypothesis testing. However, several limitations and challenges accompany this promising approach. Firstly, the success of MIB hinges on the ability to inject backdoor triggers in a manner that remains undetectable by the unauthorized party. While the paper discusses techniques to make the triggers imperceptible, there is an inherent risk that sophisticated adversaries could develop detection methods to identify and neutralize these backdoors. Furthermore, the approach's reliance on statistical hypothesis testing to provide guarantees for inference results, while innovative, may still be vulnerable to model-specific variations and adversarial defenses. However, further exploration is needed to assess the robustness of MIB across various types of datasets and models not covered in the experiments of the paper. Additionally, it is regrettable that the examples from the various datasets experimented in Figure 5 were not also included.

May 27, 2024, presented by Intae |
llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations (Meta, 2023)
(Discussion) The paper titled 'Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations' introduces Llama Guard, a model developed to enhance safety in human-AI interactions. The model leverages the Llama2-7b architecture, fine-tuned to classify and mitigate safety risks in both user prompts and AI responses. The core contribution is a comprehensive safety risk taxonomy that guides the classification process, encompassing categories like violence, hate speech, sexual content, and self-harm. Llama Guard outperforms existing moderation tools on benchmarks such as the OpenAI Moderation Evaluation dataset and ToxicChat. The model supports customizable and adaptive use through zero-shot and few-shot learning, and its weights are publicly available for further research and development. However, the work has some limitations. Llama Guard's common sense knowledge is constrained by its training data, which can lead to incorrect judgments. Its performance in non-English languages is not guaranteed, and the quality of fine-tuning labels may not comprehensively cover all policy aspects, leading to subpar performance in some cases.

May 20, 2024, presented by Shakhzod |
Humans vs. Machines in Malware Classification (USENIX, 2023)
(Discussion) This study collects and analyzes data from human participants through malware classification games, clearly revealing the differences between human and machine learning models. This is an original approach not seen in previous studies. However, the number of samples used in the experiment is limited to 20, which may be somewhat insufficient to represent the overall malware classification problem. Nevertheless, their results provide practical insights that can be applied directly to training malware analysis experts and improving ML models. For example, they propose integrating dynamic behavior analysis of human experts into ML models. This work provides important insights into how human experts and ML models classify malware, and suggests that a hybrid approach that incorporates human intuition and experience into ML models could be highly effective in future malware defense.

May 13, 2024, presented by Suyeon |
Anomaly Detection in Aerial Videos With Transformers (IEEE GRSS, 2022)
(Discussion) Let there be a video of a vehicle moving backward. An anomaly detection by frame alone would fail to capture any anomalies; instead, the model requires a sequence of frames to identify the anomaly. This paper proposes a Transformer-based anomaly detection in aerial videos from an Unmanned Aerial Vehicles (UAVs). By leveraging the Transformer encoder, they claim that the model can preserve spatiotemporal information. The main contribution comprises an annotated dataset for realistic anomalous events, a benchmark for anomaly detection, and a baseline model ANDT. ANDT exhibits the best performance for certain scenes but does not do so for all scenes tested in the paper. Initial claims suggest that previous models do not preserve spatiotemporal information. However, previous models demonstrate decent performance on the provided dataset. In Figure 4, the MemAE model can infer the vehicle moving backward as an anomalous event, demonstrating its ability to preserve spatiotemporal information. Furthermore, the train test split in Table 1 is odd yet there is no mention why it was done in this way. For example, the test set is larger than the train set in the Bike roundabout scene.

May 7, 2024, presented by Jiyong |
Ahoy SAILR! There is No Need to DREAM of C: A Compiler-Aware Structuring Algorithm for Binary Decompilation (USENIX, 2024)
(Discussion) In many ways, the compilation process is destructive. High-level constructs, variable names, and comments present in the source code are often lost. Nevertheless, decompilers aim to retrieve the source from a binary. Decompilation results vary between decompilers. So, what is a good decompilation? One could argue that fewer goto instructions indicate good decompilation. Because multiple gotos signify the decompiler failed to structure the control flow. This paper argues that good decompilation is similar to the source code and points out that 3,754 goto instructions are present in the Linux kernel. Since a goto instruction could be intentional by the developer, they argue we should separate intended gotos from unintended gotos. The authors investigate the gcc compiler to identify the transformation passes that cause unintended gotos. Afterward, they propose a structuring algorithm that deoptimizes code to remove unintended gotos while preserving intended ones. SAILR, the proposed structuring algorithm, is implemented on the angr decompiler. SAILR demonstrates decent performence against state-of-the-art decompilers. The paper evaluates SAILR with 7,355 functions from 26 popular Debian packages. However, the number of functions is oddly low, considering the number of packages used.

April 29, 2024, presented by Yujeong |
DETECTING PRETRAINING DATA FROM LARGE LANGUAGE MODELS (ICLR, 2024)
(Discussion) This paper proposes the Membership Inference method 'MIN-K% PROB' in LLM(Large Language Model). The problem presented in this paper is the challenge of detecting pretraining data. This is because LLM developers do not open the data used for pertaining, and since pretraining involves training on data instances once at a time, it makes detection even more difficult. Hence, this paper assumes that the Membership Inference Detector cannot know the distribution of pretraining data. This means that there is no Reference Model (e.g., Shadow Model) used in Membership Inference techniques. Consequently, this paper proposes the Reference-free Method, MIN-K% PROB. The 'Min-K% PROB' method selects outlier tokens with low token probability to create a set, and then uses the average of this set as the threshold for inferring between members and non-members. This method seems to be efficient as it infers membership based on the token-level probability during the pretraining. Additionally, using this method as an evaluation for Unlearning techniques, as demonstrated in the case study, would be beneficial.

April 22, 2024, presented by Intae |
You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content (S&P, 2024)
(Discussion) The paper investigates the application of prompt learning with Large Language Models (LLMs) such as GPT-3 and T5 to address the issue of toxic online content. It focuses on three primary tasks: toxicity classification, toxic span detection, and detoxification, showing that prompt learning can perform as well as or even better than traditional models specifically trained for these tasks. Notably, this approach achieves a 10% improvement in toxicity classification and significantly lowers toxicity scores in detoxification tasks while maintaining the semantic integrity of the content. The study is distinguished by its innovative use of prompt tuning for managing toxic content and its thorough evaluation across five model architectures and eight different datasets, which verifies the method's effectiveness and efficiency. However, despite these strengths, the approach's dependency on the quality of datasets and its ability to generalize to unseen toxic content remain potential weaknesses. Furthermore, the complexity involved in designing effective prompts and the possible misuse of the techniques are also concerns.

April 15, 2024, presented by Shakhzod |
ANUBIS: a provenance graph-based framework for advanced persistent threat detection (SAC, 2022)
(Discussion) This paper proposes 'ANUBIS', a provenance graph-based supervised APT detection framework. ANUBIS is a method leveraging a BNN(Bayesian Neural Network) to overcome limitations in current APT detection methods using provenance graphs. The authors hypothesize that an attacker cannot breach the probability graph used to train ANUBIS, nor does it violate the event logging mechanism of the host system. These assumptions provide a strong premise for security scenarios, but they have limitations that do not take into account the practical considerations. However, this paper is significant because it has demonstrated that it is effective in predicting the nature of the activity by introducing a new graph neighbor encoding method.

April 8, 2024, presented by Suyeon |
The Circle Of Life: A Large-Scale Study of The IoT Malware Lifecycle (USENIX, 2021)
(Discussion) This paper is a measurement study on the IoT Malware Lifecycle. It collects around 166K Linux-based IoT malware samples collected over a year to measure the characteristics of IoT Malware. The paper concludes with some characteristics that differentiate IoT malware from traditional malware, such as most IoT malware being a variant of the Mirai botnet. However, there does not seem to be unexpected findings. A slight complaint is that figures within the paper had unclear captions. Although the paper explains the concepts, some figures are difficult to understand.

April 1, 2024, presented by Jiyong |
How Machine Learning Is Solving the Binary Function Similarity Problem (USENIX, 2022)
(Discussion) In this paper, a dataset composition for testing the latest BCSD (Binary Code Similarity Detection) techniques is proposed, and analysis of test results for various BCSD models, including the latest GNN-based BCSD and Embedding-based BCSD, has been performed. However, in the evaluation section, it is necessary to divide the results for similarity and similarity lacking into separate tables to enhance clarity. Additionally, there is a lack of explanation for the many abbreviations used in the dataset composition, making it difficult to understand.

March 25, 2024, presented by Yujeong |
Quark: Controllable Text Generation with Reinforced [Un]learning (NeurIPS, 2022)
(Discussion) The paper discussion focuses on an innovative approach to optimizing the reward function of a Reinforcement Learning (RL) model to mitigate undesired behaviors. The authors detail a three-stage algorithm comprising exploration, quantization, and learning, each critical to the model's development. Notably, the paper demonstrates the model's effectiveness through three distinct evaluations, addressing the reduction of toxicity, improvement over negative baselines, and minimization of repetitive actions. This structured evaluation underscores the model's capability to unlearn specific unwanted behaviors, which outperforms both baselines and state-of-the-art RL methods. In this paper, the focus is on defining three undesirable behaviors and proposing three methods (e.g. reward function) to forget each behavior. It seems inefficient to train a model separately for each behavior.

March 19, 2024, presented by Shakhzod |
A Comprehensive Detection Method for the Lateral Movement Stage of APT Attacks (IEEE IoT-J, 2023)
(Discussion) This paper designs a multidimensional detection framework to detect lateral movement behavior against APT attacks in an intranet environment based on the SMB protocol. However, contrary to this purpose, the experimental results are unfortunate in that the purpose and results differ because they are about how well malware has been classified.

March 11, 2024, presented by Suyeon |
Anomaly detection in a forensic timeline with deep autoencoders (JISA, 2021)
(Discussion) Systems generate logs to record internal events. These logs are used after a cyber incident for forensic investigation. Unfortunately, the system generates numerous logs during its runtime and majority of the analysis is manual. This paper proposes a deep autoencoder for anomaly detection to assist analyzers by highlighting anomalous events in the Linux kernal system logs. Anomaly detection is a common application for autoencoders. It is utilized to analyze network traffic, logs, etc. It is difficult to identify what is different from previous work and this paper. Furthermore, the proposed model is evaluated with old dataset against ML methods such as SVM. It would have been preferable to see the model’s performance on up-to-date dataset against state-of-the-art models.

March 6, 2024, presented by Jiyong |
Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures (ASIA CCS, 2023)
(Discussion) There are efficient ways to represent binary functions using a tokenizer (e.g., BPE) in assembly language. I'm curious about why the suggestion is to divide them into seven features like Vex-IR instructions, LibcCalls, Constants, etc., and create specific tokenizers for each feature to generate one-hot vectors. I also wonder if this approach is genuinely effective. The explanation about whether this method of tokenization has a positive impact on BCSD (Binary Code Similarity Detection) performance seems insufficient.

February 28, 2024, presented by Yujeong |
Learning to Unlearn: Instance-wise Unlearning for Pre-trained Classifiers (AAAI, 2024)
(Discussion) The paper utilizes adversarial examples to retain the original decision boundary during unlearning. This approach is intriguing yet it raises questions as to what it means to “forget” a data point. The paper states that misclassification signifies forgetting yet “Right to be Forgotten” seems to imply the removal of a certain training data.

February 20, 2024, presented by Jiyong |
From Grim Reality to Practical Solution: Malware Classification in Real-World Noise (S&P, 2023)
Malware datasets inevitably contain incorrect labels due to the shortage of expertise and experience needed for sample labeling. Previous research demonstrated that a training dataset with incorrectly labeled samples would result in inaccurate model learning. To address this problem, researchers have proposed various noise learning methods to offset the impact of incorrectly labeled samples, and in image recognition and text mining applications, these methods demonstrated great success. In this work, we apply both representative and state-of-the-art noise learning methods to real-world malware classification tasks. We surprisingly observe that none of the existing methods could minimize incorrect labels’ impact. Through a carefully designed experiment, we discover that the inefficacy mainly results from extreme data imbalance and the high percentage of incorrectly labeled data samples. As such, we further propose a new noise learning method and name it after MORSE. Unlike existing methods, MORSE customizes and extends a state-of-the-art semi-supervised learning technique. It takes possibly incorrectly labeled data as unlabeled data and thus avoids their potential negative impact on model learning. In MORSE, we also integrate a sample re-weighting method that balances the training data usage in the model learning and thus handles the data imbalance challenge. We evaluate MORSE on both our synthesized and real-world datasets. We show that MORSE could significantly outperform existing noise learning methods and minimize the impact of incorrectly labeled data.

February 12, 2024, presented by Yujeong |
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation (EMNLP, 2021)
(Abstract) Pre-trained models for Natural Languages (NL) like BERT and GPT have been recently shown to transfer well to Programming Languages (PL) and largely benefit a broad set of code-related tasks. Despite their success, most current methods either rely on an encoder-only (or decoder-only) pre-training that is suboptimal for generation (resp. understanding) tasks or process the code snippet in the same way as NL, neglecting the special characteristics of PL such as token types. We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers. Our model employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning. Besides, we propose a novel identifier-aware pre-training task that enables the model to distinguish which code tokens are identifiers and to recover them when they are masked. Furthermore, we propose to exploit the user-written code comments with a bimodal dual generation task for better NL-PL alignment. Comprehensive experiments show that CodeT5 significantly outperforms prior methods on understanding tasks such as code defect detection and clone detection, and generation tasks across various directions including PL-NL, NL-PL, and PL-PL. Further analysis reveals that our model can better capture semantic information from code. Our code and pre-trained models are released at https: //github.com/salesforce/CodeT5.

February 5, 2024, presented by Jiyong |
InCoder: A Generative Model for Code Infilling and Synthesis (EMNLP, 2021)
(Abstract) Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce INCODER, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via masking and infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first large generative code model that is able to infill arbitrary regions of code, which we evaluate in a zero-shot setting on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. Our models and code are publicly released.

January 22, 2024, presented by Yujeong |
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (NeurIPS, 2022)
(Abstract) Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical limitations. In particular, they often follow a standard supervised fine-tuning procedure to train a code generation model only from the pairs of natural-language problem descriptions and ground-truth programs. Such paradigm largely ignores some important but potentially useful signals in the problem specification such as unit tests, which thus often results in poor performance when solving complex unseen coding tasks. To address the limitations, we propose “CodeRL”, a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL). Specifically, during training, we treat the code-generating LM as an actor network, and introduce a critic network that is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor. During inference, we introduce a new generation procedure with a critical sampling strategy that allows a model to automatically regenerate programs based on feedback from example unit tests and critic scores. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives, larger model sizes, and better pretraining data. Our method not only achieves new SOTA results on the challenging APPS benchmark, but also shows strong zero-shot transfer capability with new SOTA results on the simpler MBPP benchmark.

January 4, 2024, presented by Jiyong |
Ground Truth for Binary Disassembly is Not Easy (USENIX, 2022)
(Abstract) Modern disassembly tools often rely on empirical evaluations to validate their performance and discover their limitations, thus promoting long-term evolvement. To support the empirical evaluation, a foundation is the right approach to collect the ground truth knowledge. However, there has been no unanimous agreement on the approach we should use. Most users pick an approach based on their experience or will, regardless of the properties that the approach presents. In this paper, we perform a study on the approaches to building the ground truth for binary disassembly, aiming to shed light on the right way for the future. We first provide a taxonomy of the approaches used by past research, which unveils five major mechanisms behind those approaches. Following the taxonomy, we summarize the properties of the five mechanisms from two perspectives: (i) the coverage and precision of the ground truth produced by the mechanisms and (ii) the applicable scope of the mechanisms (e.g., what disassembly tasks and what types of binaries are supported). The summarization, accompanied by quantitative evaluations, illustrates that many mechanisms are ill-suited to support the generation of disassembly ground truth. The mechanism best serving today’s need is to trace the compiling process of the target binaries to collect the ground truth information. Observing that the existing tool to trace the compiling process can still miss ground truth results and can only handle x86/x64 binaries, we extend the tool to avoid overlooking those results and support ARM32/AArch64/MIPS32/MIPS64 binaries. We envision that our extension will make the tool a better foundation to enable universal, standard ground truth for binary disassembly.

November 23, 2023, presented by Nozima |
Black-box Attacks Against Neural Binary Function Detection (RAID, 2023)
(Abstract) Binary analyses based on deep neural networks (DNNs), or neural binary analyses (NBAs), have become a hotly researched topic in recent years. DNNs have been wildly successful at pushing the performance and accuracy envelopes in the natural language and image processing domains. Thus, DNNs are highly promising for solving binary analysis problems that are hard due to a lack of complete information resulting from the lossy compilation process. Despite this promise, it is unclear that the prevailing strategy of repurposing embeddings and model architectures originally developed for other problem domains is sound given the adversarial contexts under which binary analysis often operates. In this paper, we empirically demonstrate that the current state of the art in neural function boundary detection is vulnerable to both inadvertent and deliberate adversarial attacks. We proceed from the insight that current generation NBAs are built upon embeddings and model architectures intended to solve syntactic problems. We devise a simple, reproducible, and scalable black-box methodology for exploring the space of inadvertent attacks – instruction sequences that could be emitted by common compiler toolchains and configurations – that exploits this syntactic design focus. We then show that these inadvertent misclassifications can be exploited by an attacker, serving as the basis for a highly effective black-box adversarial example generation process. We evaluate this methodology against two state-of-the-art neural function boundary detectors: XDA and DeepDi. We conclude with an analysis of the evaluation data and recommendations for ho

October 5, 2023, presented by Jiyong |
SelectiveTaint: Efficient Data Flow Tracking With Static Binary Rewriting (USENIX, 2021)
(Abstract) Taint analysis has been widely used in many security applications such as exploit detection, information flow tracking, malware analysis, and protocol reverse engineering. State-of-theart taint analysis tools are usually built atop dynamic binary instrumentation, which instruments at every possible instruction, and rely on runtime information to decide whether a particular instruction involves taint or not, thereby usually having high performance overhead. This paper presents SELECTIVETAINT, an efficient selective taint analysis framework for binary executables. The key idea is to selectively instrument the instructions involving taint analysis using static binary rewriting instead of dynamic binary instrumentation. At a high level, SELECTIVETAINT statically scans taint sources of interest in the binary code, leverages value set analysis to conservatively determine whether an instruction operand needs to be tainted or not, and then selectively taints the instructions of interest. We have implemented SELECTIVETAINT and evaluated it with a set of binary programs including 16 coreutils (focusing on file I/O) and five network daemon programs (focusing on network I/O) such as nginx web server. Our evaluation results show that the binaries statically instrumented by SELECTIVETAINT has superior performance compared to the state-of-the-art dynamic taint analysis frameworks (e.g., 1.7x faster than that of libdft).