Reading Seminar
-
▶ Instruction Backdoor Attacks Against Customized LLMs (USENIX, 2024) (Discussion) The study explores instruction-based backdoor attacks on large language models (LLMs), targeting their instruction-following capabilities through crafted prompts embedding backdoor instructions. It categorizes attacks into word-level, syntax-level, and semantic-level, showing high attack success rates (ASR) while maintaining clean input utility. Larger models like GPT-4 and Claude-3 exhibit higher ASRs, making them more susceptible than smaller models such as LLaMA2 and Mistral. However, the attack's effectiveness depends heavily on trigger design, including length, position, and complexity, with semantic-level attacks being stealthier but less versatile in simpler tasks. The study highlights challenges in defending against such attacks, as existing detection methods like intent analysis yield high false alarm rates and limited accuracy in identifying more covert triggers. Practical limitations include the reliance on static prompt structures, assuming attackers have precise knowledge of prompt formats and tasks, which may not generalize to dynamic or adversarial real-world scenarios. Additionally, the approach has not been tested extensively across diverse LLM architectures or specialized domains, leaving gaps in understanding broader vulnerabilities and mitigation strategies. -
▶ PLeak: Prompt Leaking Attacks against Large Language Model Applications (CCS, 2024) (Discussion) The paper introduces PLeak, an innovative framework designed to extract confidential system prompts from LLM applications, which are often protected as intellectual property. PLeak automates the process using a closed-box optimization approach, relying on shadow LLMs and datasets to simulate the target environment. It uses techniques like incremental query optimization and response aggregation to reconstruct system prompts effectively. The framework significantly outperforms previous methods, achieving high accuracy in metrics such as Exact Match and Semantic Similarity, and successfully reconstructs prompts in 68% of real-world tests on the Poe platform. However, PLeak has limitations, including its dependence on shadow environments that closely resemble the target, potential scalability issues with computationally intensive optimizations, and only partial exploration of defenses like output filtering or obfuscation. While it demonstrates critical security risks in LLM applications, the lack of broad evaluation across diverse platforms and limited engagement from reported parties like Poe raise concerns about its practical implications and ethical considerations. -
▶ Deepfake CAPTCHA: A Method for Preventing Fake Calls (ASIA CCS, 2023) (Discussion) This paper introduces D-CAPTCHA, an innovative active defense mechanism against real-time deepfakes. Unlike traditional passive detection methods that rely on analyzing patterns or artifacts, D-CAPTCHA challenges the capabilities of deepfake models by requiring the attacker to perform tasks that are easy for humans but difficult for AI to replicate, such as humming a tune or speaking with emotion. Experimental results demonstrate that D-CAPTCHA significantly enhances detection accuracy, achieving 91-100% compared to state-of-the-art methods. While it offers long-term effectiveness and extensibility in combating audio-based deepfake threats, its practical application is limited to real-time scenarios, and it may impact user experience if not carefully implemented. Furthermore, it is worth noting a minor inconsistency in Table 1, where the acronym "R" is used for both Repeat Accent and Raspberry, which could be corrected for clarity. Overall, D-CAPTCHA represents a groundbreaking approach to addressing the growing threat of deepfake technology and highlights the potential for proactive AI-driven security solutions that adapt to evolving adversarial capabilities. -
▶ Don’t Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models (USENIX, 2024) (Discussion) Since the advent of Large Language Models (LLMs), there has been increasing interest in the concept of 'Jailbreak Prompts' for these LLMs. Since LLMs are trained on a large corpus of data, they can generate harmful content. As a result, LLM services provide 'Aligned Language Models' that aim to prevent generating harmful content. The 'Jailbreak Prompts' are prompts that allow an adversary to bypass the defense and cause LLMs to generate harmful content. This paper conducts a measurement study on existing Jailbreak Prompt methods to answer three research questions: 'What are jailbreak strategies and how do they work?', 'How do humans develop and execute jailbreak attacks?', and 'How to automate the process of jailbreaking?'. While the paper overviews the current landscape of jailbreak prompts, the metrics defined in the paper (Expected Maximum Harmfulness (EMH) and Jailbreak Success Rate(JSR)) are difficult to comprehend. For instance, in Table 3 the authors present the metrics for pairs of jailbreak prompts and malicious queries. Unfortunately, it is difficult to comprehend the result as EMH does not seem to be normalized (exceeding 1.0 in certain cases) and the overall JSR is very low, causing the reader to question the effectiveness of the attack. -
▶ RustSan: Retrofitting AddressSanitizer for Efficient Sanitization of Rust (USENIX, 2024) (Discussion) This paper delve into the innovative contributions of RustSan, assessing its implications in the Rust programming ecosystem. While the integration of AddressSanitizer into Rust programming through selective instrumentation significantly reduces runtime overhead, it also raises questions about the balance between optimization and comprehensiveness of memory safety checks. One needs to consider the scenarios where safe and overlapping memory regions are crucial yet may be overridden inadvertently in corner cases. And all hopes on static analysis for detection of unsafe blocks may fall short of considering dynamic interactions which only appear at runtime and thus present challenges in practical applications. While this may sound promising, performance gains shown in benchmarks will have to be investigated further in larger-scale applications with less predictable memory access patterns. Finally, though the approach taken by RustSan mitigates some of the inherent trade-offs between flexibility and safety in Rust, it remains important to establish the robustness of the tool in the face of complex programming paradigms and its ease of integration into existing workflows for developers. This prompts an ongoing evaluation of its long-term reliability and adaptability. -
▶ Uncovering and Exploiting Hidden APIs in Mobile Super Apps (CCS, 2023) (Discussion) This paper makes a valuable contribution by systematically revealing undocumented APIs and their potential security exploits in popular “super apps” such as WeChat and TikTok. By leveraging both static and dynamic analysis, the authors’ tool, APIScope, successfully identifies hidden APIs and demonstrates their risks, thereby advancing our understanding of super app ecosystems and offering actionable guidance for platform vendors and developers. However, several limitations remain. The study focuses exclusively on Android versions of super apps using the V8 engine, narrowing the generalizability of its findings and excluding large market segments, like iOS platforms or super apps employing different JavaScript engines. Moreover, while APIScope identifies hidden APIs and validates certain vulnerabilities through test miniapps, it does not fully explore how organizational factors—such as development practices, code review processes, or internal security policies—contribute to these undocumented APIs. The paper also falls short of presenting robust, long-term defenses or systematic industry-level guidelines for mitigating undocumented APIs. A deeper integration of automated patching, secure coding frameworks, and clearer vendor-side standardization efforts would strengthen the practical impact. Future work should strive to broaden platform scope, thoroughly investigate root causes behind the emergence of undocumented APIs, and propose more comprehensive preventative strategies. -
▶ Teach LLMs to Phish: Stealing Private Information from Language Models (ICLR, 2024) (Discussion) The paper introduces neural phishing, a method to extract sensitive information from large language models (LLMs) by injecting benign-looking poisoned data into their training datasets. The attack has three phases: poisoning during pretraining, memorization during fine-tuning, and information extraction during inference. Success rates range from 10% to 80%, depending on the model size, data duplication, and attacker knowledge. Larger and overtrained models are more vulnerable. Standard defenses like deduplication and differential privacy are ineffective. The work highlights significant privacy risks in LLMs and emphasizes the need for advanced defenses. While the study is significant, its limitations include reliance on poisoned data appearing before secrets in training, difficulty in fully preventing memorization of sensitive patterns, and the low likelihood of individuals leaking sensitive data directly to LLMs. -
▶ Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models (USENIX, 2024) (Discussion) The paper presents a novel method to infer whether a document was included in the training dataset of large language models (LLMs). Using a black-box approach based on predicted token-level probabilities and normalization techniques, the authors develop a meta-classifier to distinguish between member and non-member documents. Evaluated on OpenLLaMA with datasets from Project Gutenberg and ArXiv, the method achieves AUC scores of 0.856 for books and 0.678 for papers, demonstrating its effectiveness. The study highlights that even small LLMs retain significant memorization, raising concerns about privacy and transparency in LLM training data. However, the approach relies on known datasets, limiting its applicability to proprietary models without accessible training data. It also assumes that non-member documents are similar but temporally distinct, which may not always hold. Additionally, the method's focus on black-box inference may not address vulnerabilities in fine-tuned or modified LLMs, warranting further research into privacy mitigation strategies. -
▶ Secrets Revealed in Container Images: An Internet-wide Study on Occurrence and Impact (ASIA CCS, 2023) (Discussion) This study effectively highlights a critical security vulnerability in containerized applications, revealing that 8.5% of analyzed Docker images contain sensitive information such as cryptographic keys or API secrets. The large-scale dataset and rigorous filtering enhance the reliability of findings, providing strong evidence that secret leakage is a systemic issue rather than isolated errors. This insight underscores the need for structural changes in the Docker paradigm, offering valuable recommendations such as improved secret scanning tools and stricter default configurations. However, the reliance on regex-based detection may limit the identification of unconventional leaks, and the inability to validate API secrets due to ethical constraints reduces the completeness of the analysis. Furthermore, the discussion could have delved deeper into human and organizational factors, such as user training, to address the root causes of these leaks. Despite these limitations, the study makes significant contributions by exposing the widespread impact of secret leakage, affecting over 275,000 Internet-facing hosts, and providing a foundation for collaborative efforts to mitigate these vulnerabilities. This research holds substantial value for improving container security practices and guiding future innovations in secret management. -
▶ AI Psychiatry: Forensic Investigation of Deep Learning Networks in Memory Images (USENIX, 2024) (Discussion) After the advent of deep learning (DL) models, several practical attacks against DL have surfaced. Consequently, the research community developed a complementary vetting technique to detect such attacks. However, these techniques do not address how to retrieve these DL models for inspection. Existing state-of-the-art memory forensic techniques rely on static data structure recovery, which cannot recover complex data structures used in DL software stacks. Furthermore, ML frameworks employ advanced garbage collection and memory management optimization, which obfuscates in-memory objects. Moreover, the layer weights are stored in a consecutive GPU memory while the context of these objects is stored in the CPU data structure. Lastly, white-box analysis techniques must rehost the DL model in a live environment for inspection. This paper proposes AiP, an automated system for recovering DL models from meory and rehosting them into a live process. AiP requires no prior knoweldge of the particular model. In the evaluation using LISA, CIFAR-10, and IMDB dataset, AiP successfully recovered 30 different models and achieved 100% recover accuracy. The paper demonstrates its practicality by recovering popular Python-based frameworks (Pytorch and Tensorflow), however acknowledges the existance of other ML frameworks (e.g., HuggingFace). -
▶ BUDAlloc: Defeating Use-After-Free Bugs by Decoupling Virtual Address Management from Kernel (USENIX, 2024) (Discussion) This paper presents BUDAlloc, a novel one-time allocator (OTA) designed to address use-after-free (UAF) vulnerabilities by decoupling virtual address management from the kernel. Unlike conventional approaches, BUDAlloc shifts virtual memory operations to the user level, reducing system call overhead and improving performance, especially in multi-threaded applications. Using eBPF-based page fault handling, BUDAlloc efficiently manages virtual aliases and batches memory operations to minimize fragmentation. It offers two modes—detection and prevention—allowing users to prioritize either bug detection or performance. Evaluations show that BUDAlloc improves performance by 15% over DangZero and reduces memory overhead by 61% compared to FFmalloc, making it a scalable and practical solution for UAF bug detection without modifying existing binaries. -
▶ Exploring ChatGPT’s Capabilities on Vulnerability Management (USENIX, 2024) (Discussion) The paper, Exploring ChatGPT’s Capabilities on Vulnerability Management, examines ChatGPT's ability to assist in six stages of vulnerability management, utilizing a dataset of 70,346 samples. The authors compare ChatGPT's performance with state-of-the-art (SOTA) methods, assessing tasks like bug report summarization and vulnerability repair. While ChatGPT demonstrates promising potential in some areas, such as summarizing bug reports, challenges persist, particularly in patch correctness assessment. The paper's strengths include a comprehensive dataset and innovative exploration of prompt engineering techniques. However, it faces limitations in generalizing ChatGPT’s performance across all tasks, pointing to future research needs for refining its application in security tasks. This study is significant in highlighting ChatGPT’s growing role in automated vulnerability management, with the potential for advancing research in the domain. -
▶ WebRR: A Forensic System for Replaying and Investigating Web-Based Attacks in The Modern Web (USENIX, 2024) (Discussion) This paper presents WEBRR (Web-based Enterprise Record and Replay), a novel forensic system designed for replaying and investigating web-based attacks in modern web environments. The authors address the limitations of existing forensic analysis tools in dealing with the complexity of modern web applications and browsers. WEBRR achieves deterministic replay of web attacks by leveraging the single-threaded nature of JavaScript and implementing a comprehensive recording and replay mechanism for various web components, including the Document Object Model (DOM), network requests, and asynchronous operations. The system is implemented as an extension to the Chromium browser and demonstrates high accuracy in replaying both benign websites and malicious attacks across multiple platforms (Linux, Windows, and Android). The authors evaluate WEBRR against existing solutions like WebCapsule, Mugshot, and RR, showing superior performance in replaying complex web attacks. The system exhibits low runtime overhead (median 2.94% increase in page load time) and reasonable storage requirements (17 MB per minute of browsing). While WEBRR has some limitations, such as incomplete support for certain web technologies (e.g., WebRTC, IndexedDB), the authors argue that these can be addressed with additional engineering effort. Overall, WEBRR represents a significant advancement in web forensics, enabling more accurate and interactive investigations of web-based attacks. -
▶ Can I Hear Your Face? Pervasive Attack on Voice Authentication Systems with a Single Face Image (USENIX, 2024) (Discussion) This paper presents a novel deepfake attack called Foice. This attack generates a synthetic voice of a victim using just a single face image, without requiring any voice samples. The generated voice is realistic enough to fool commercial voice authentication systems, such as those used by WeChat, Microsoft Azure, and other platforms. The core idea behind Foice is to learn the partial correlation between face and voice features, and then combine these with random face-independent voice features generated from a Gaussian distribution. The attack's effectiveness was demonstrated through real-world experiments on several authentication systems and voice assistants, where it showed high success rates, indicating significant vulnerabilities in these systems. The study emphasizes the importance of new defenses against such attacks, as current systems lack sufficient protection against deepfake threats based solely on facial images. -
▶ Decoding the MITRE Engenuity ATT&CK Enterprise Evaluation: An Analysis of EDR Performance in Real-World Environments (ASIA CCS, 2024) (Discussion) The paper provides a comprehensive analysis of the MITRE ATT&CK evaluations, aiming to address the limitations of the raw evaluation results. The paper's strengths lie in its introduction of new analysis methods, including whole-graph analysis and holistic assessments, to gain deeper insights into EDR system capabilities. The authors' reconstruction of attack scenarios and subsequent analysis shed light on EDR systems' attack reconstruction, behavior correlation, and overall detection performance. However, the paper could be improved by addressing the lack of access to crucial information like false positive alarm volume, response time, and raw data, which limits the scope of the analysis. Despite this limitation, the paper offers valuable contributions to the field by providing a systematic interpretation of MITRE ATT&CK evaluation results, aiding researchers, practitioners, and vendors in understanding and improving EDR systems. -
▶ Racing on the Negative Force: Efficient Vulnerability Root-Cause Analysis through Reinforcement Learning on Counterexamples (USENIX, 2024) (Discussion) This paper presents RACING, a novel approach for efficient root cause analysis (RCA) in fuzzing. By leveraging counterexamples and reinforcement learning, RACING intelligently guides the fuzzing process to quickly identify the root cause of crashes. The experimental results demonstrate that RACING significantly outperforms the state-of-the-art technique, Aurora, in terms of both speed and accuracy. This work holds significant implications for improving the efficiency and effectiveness of vulnerability detection and analysis in software. However, limitations exist, such as the reliance on source code and the lack of support for compound predicates, which offer avenues for future research. Overall, RACING represents a promising advancement in automated RCA for fuzzing, with the potential to significantly impact the field of software security. -
▶ Data Coverage for Guided Fuzzing (USENIX, 2024) (Discussion) Code coverage faces a major challenge in that it only reflects a small part of a program's structure, leaving some crucial program constructs uncovered. To address this issue, this work proposes data coverage for guided fuzzing - a technique that focuses on detecting novel constant data references and maximizing their coverage. Additionally, real-world fuzzing practices were optimized by classifying data access according to semantics and designing customized collection strategies. This is crucial because improper handling of constant data can significantly impact fuzzing throughput. To further improve fuzzing efficiency, novel storage and utilization techniques were developed. Finally, libFuzzer was enhanced with data coverage capabilities and submitted to Google's FuzzBench for evaluation. In this evaluation, the proposed approach outperformed many state-of-the-art fuzzers and achieved the best coverage score in the experiment. As a result, using this enhanced code coverage, 28 previously unknown bugs in OSS-Fuzz projects were discovered. -
▶ Post Hoc Explanations of Language Models Can Improve Language Models (NeurIPS, 2023) (Discussion) LLMs demonstrate remarkable capabilities in various complex tasks. Recent research has delved into the possibility of enhancing the performance of these LLMs by incorporating human-annotated data. Unfortunately, this method is limited in scalability and underperforms in specific scenarios. This paper proposes a technique that can automatically generate natural language rationales using the output attribution scores that capture the influence of each feature on model predictions. The new framework, AMPLIFY, improves the prediction accuracy to about 10-25%, including those where prior approaches relying on human-annotated rationales fall short. A fundamental limitation of AMPLIFY is that it inherits the limitations of LLMs and Post hoc explanation methods. -
▶ Beyond Memorization: The Challenge of Random Memory Access in Language Models (ACL, 2024) (Discussion) This paper explores the memory access patterns of language models (LMs), particularly focusing on the challenges they face with random access to memorized information. The authors demonstrate that while LMs can sequentially reproduce stored content effectively, they struggle with accessing information in random segments, especially in the middle of memorized sequences. To address this, the paper proposes two strategies that "recitation" (having the model first recall the content) and "permutation" (shuffling sentence order) and shows through experiments that these methods can significantly improve the model's performance. The study provides valuable insights into the limitations of memory access in LMs and suggests practical ways to enhance their performance in real-world applications. However, the research is limited to decoder-only models and does not explore larger models beyond 7 billion parameters, which could offer further insights. Additionally, experiments are conducted on a fixed-size text corpus, leaving questions about scalability to larger pretraining datasets. Despite these limitations, the paper makes an important contribution to understanding and improving memory access in language models. -
▶ On Large Language Models’ Resilience to Coercive Interrogation (S&P, 2024) (Discussion) In this paper, a new weakness in Large Language Models (LLMs) is exposed that doesn’t rely on creating special prompts, known as jail-breaking. Instead, this method, called model interrogation, uses the fact that even when an LLM refuses to answer a harmful question, the damaging reply might still be hidden in the less obvious parts of its output. By accessing the list of probable responses (top-k token predictions) available in many open-source and commercial LLMs, a person with bad intentions can manipulate the model to reveal these harmful answers. They do this by choosing less likely words from the list at certain points in the response. This technique proves to be not only different but also more effective than traditional jail-breaking, with a 92% success rate compared to 62%, and it works 10 to 20 times faster. The harmful content brought out by this method is also of higher quality. Moreover, combining model interrogation with jail-breaking methods greatly improves results over using either technique alone. The study also shows that LLMs made for specific tasks like programming can still be tricked into giving harmful responses using this method. -
▶ LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked (Arxiv, 2024) (Discussion) The authors of the paper present a clever and straightforward approach to enhance the safety of large language models (LLMs). The idea is to have the LLM essentially double-check its own work for any potentially harmful content. When the LLM receives a prompt that could lead to a harmful response, it generates the text as usual. But then, this response is passed to another LLM that's been instructed to act as a filter. This filter LLM reads the generated text and makes a judgment call by evaluating "harmfulness". The beauty of this method lies in its simplicity - doesn't require any retraining or tweaking of the original LLM, also it doesn't involve any complex pre-processing of the input. They tested this "LLM self-defense" mechanism on two popular models, GPT-3.5 and LLaMA 2. In many cases, they were able to virtually eliminate the generation of harmful content. -
▶ Meta Large Language Model Compiler: Foundation Models of Compiler Optimization (Arxiv, 2024) (Discussion) This paper introduces LLM Compiler, a novel large language model designed for code optimization and compiler tasks. Building upon the Code Llama model, LLM Compiler is specifically trained to understand intermediate representations (IR) and assembly code, showing improved performance in code generation and compiler emulation tasks. The model demonstrates strong capabilities in handling compiler-related tasks and outperforms previous models in various benchmarks. However, the paper notes some limitations. The primary issue is the model's fixed input sequence length of 16k tokens, which restricts its ability to handle very large codebases effectively. Despite efforts to split large translation units, some remain too large for the model to process. Additionally, the accuracy of the model's outputs requires rigorous evaluation and verification to ensure correctness, as any suggested optimizations should be thoroughly tested. Despite these limitations, the paper makes significant contributions to the field of compiler optimization and provides a solid foundation for future research. -
▶ Unlearning Bias in Language Models by Partitioning Gradients (ACL, 2023) (Discussion) The paper 'Unlearning Bias in Language Models by Partitioning Gradients' presents a novel technique, partitioned contrastive gradient unlearning (PCGU), to debias pretrained masked language models. PCGU selectively optimizes weights that contribute to biases by calculating gradients for contrastive sentence pairs. Evaluations using StereoSet and CrowS Pairs datasets demonstrate PCGU's effectiveness in reducing gender-profession bias with minimal impact on language modeling performance. Additionally, PCGU shows potential in mitigating biases across other domains like race and religion. However, it would have been better if there were additional experiments depending on the size of the language model. Also This technique highlights the need for ongoing research to develop robust strategies for addressing bias in language models. -
▶ ProPILE: Probing Privacy Leakage in Large Language Models (NeurIPS, 2024) (Discussion) ProPILE, a novel probing tool designed to assess the risk of Personally Identifiable Information (PII) leakage in Large Language Models (LLMs), demonstrates the feasibility of extracting PII through strategic prompt engineering. By utilizing black-box probing for data subjects and white-box probing for LLM service providers, ProPILE enables the assessment of PII leakage in models like OPT-1.3B trained on the Pile dataset. Experiments reveal that the likelihood of target PII generation increases with more specific prompts or additional linked PII details, while white-box probing with access to a limited subset of training data significantly amplifies this leakage potential. These findings underscore the effectiveness of ProPILE as an assessment tool and highlight the need for ongoing research and robust mitigation strategies to address privacy vulnerabilities in LLMs. -
▶ Improving Real-world Password Guessing Attacks via Bi-directional Transformers (USENIX, 2023) (Discussion) This paper proposes a bi-directional transformer-based guessing framework, referred to as PassBERT, which applies the pre-training/fine-tuning paradigm to password guessing attacks. First, a model pre-trained on the general password distribution was prepared, which was then fine-tuned on three specifically designed attack approaches. These methods reflect real-world attack scenarios and include: 1) conditional password guessing, which recovers the complete password given a partial one; 2) targeted password guessing, which compromises the password of a specific user using personal information; and 3) adaptive rule-based password guessing, which selects rules to generate rule-transformed password candidates. The experimental results show that the fine-tuned models can outperform state-of-the-art models by 14.53%, 21.82%, and 4.86% in the three attacks, respectively, demonstrating the effectiveness of bi-directional transformers on downstream guessing attacks. Furthermore, a hybrid password strength meter was proposed to mitigate the risks from these three types of attacks. -
▶ Semantic Ranking for Automated Adversarial Technique Annotation in Security Text (ASIA CCS, 2024) (Discussion) This paper introduces an innovative multi-stage ranking system for extracting and annotating adversarial techniques from threat intelligence reports, leveraging language models fine-tuned for cybersecurity tasks. The system demonstrates significant improvements in accuracy and recall compared to previous methods, with enhanced performance on verbose datasets like MITRE ATT&CK Reports and WeLiveSecurity. The comprehensive approach, including testing across various datasets and the introduction of a public dataset with 6.6K annotations, strengthens the study's contributions to the field. However, the observed performance disparity across datasets highlights the need for further refinement to handle concise descriptions more effectively. Overall, the study advances automated threat technique annotation and provides a solid foundation for future cybersecurity research and development. -
▶ LogBERT: Log Anomaly Detection via BERT (IJCNN, 2021) (Discussion) Detecting anomalies within system logs is paramount to protecting the system from an attack or malfunction. Traditional methods employ regular expressions or machine learning models to identify anomalous events. These approaches depend on handcrafted features and are unable to capture temporal information. For example, malicious logs could be benign on their own, but the collection of these logs could be malicious. Recently, deep learning models such as recurrent neural networks (RNNs) have been widely used to evaluate these sequences and can capture temporal information. However, RNNs cannot encode contextual information in a bi-directional manner. This paper proposes a log anomaly detection model based on BERT. BERT can capture contextual information in a bi-directional manner making it suitable for this task. The model is evaluated on three datasets: Hadoop Distributed File System (HDFS), BlueGene/L Supercomputer System (BGL), and Thunderbird-mini dataset. The baseline models are Principal Component Analysis (PCA), One-Class SVM (OCSVM), IsolationForest (iForest), LogCluster, DeepLog, and LogAnomaly. Their proposed model demonstrates superior performance over all baseline state-of-the-art models in all datasets. However, the detailed process in training is not described. For example, BERT splits the training step into a pretraining and finetuning phase. However, LogBERT seems to only discuss the pretraining phase and have no mention about the fine tuning phase. As a result, it is not clear if the fine tuning phase is excluded or it is not mentioned. -
▶ Universal and Transferable Adversarial Attacks on Aligned Language Models (arxiv, 2023) (Discussion) This paper proposes an attack method that causes aligned language models to generate undesirable behaviors. Aligned language models refuse to respond to harmful queries. To enable these language models to respond to harmful queries, this paper attaches a suffix to the query. This approach makes the LLM generate a positive response instead of refusing to answer. Positive responses to harmful queries were obtained from ChatGPT, Bard, Claude, LLaMA-2-Chat, Pythia, and Falcon, with a much higher success rate for GPT-based models. The evaluation section only demonstrates the high success rate of the proposed attack method. It would have been better if the paper included the limitations of this study and a model ablation study section. -
▶ Membership Inference via Backdooring (IJCAI, 2022) (Discussion) The paper titled 'Membership Inference via Backdooring' presents a novel approach to addressing data privacy concerns in machine learning by introducing a method called Membership Inference via Backdooring (MIB). The main contribution of MIB lies in its ability to mark a small number of data samples, which, when used by an unauthorized party to train a model, allows the data owner to later identify this misuse through black-box queries and statistical hypothesis testing. However, several limitations and challenges accompany this promising approach. Firstly, the success of MIB hinges on the ability to inject backdoor triggers in a manner that remains undetectable by the unauthorized party. While the paper discusses techniques to make the triggers imperceptible, there is an inherent risk that sophisticated adversaries could develop detection methods to identify and neutralize these backdoors. Furthermore, the approach's reliance on statistical hypothesis testing to provide guarantees for inference results, while innovative, may still be vulnerable to model-specific variations and adversarial defenses. However, further exploration is needed to assess the robustness of MIB across various types of datasets and models not covered in the experiments of the paper. Additionally, it is regrettable that the examples from the various datasets experimented in Figure 5 were not also included. -
▶ llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations (Meta, 2023) (Discussion) The paper titled 'Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations' introduces Llama Guard, a model developed to enhance safety in human-AI interactions. The model leverages the Llama2-7b architecture, fine-tuned to classify and mitigate safety risks in both user prompts and AI responses. The core contribution is a comprehensive safety risk taxonomy that guides the classification process, encompassing categories like violence, hate speech, sexual content, and self-harm. Llama Guard outperforms existing moderation tools on benchmarks such as the OpenAI Moderation Evaluation dataset and ToxicChat. The model supports customizable and adaptive use through zero-shot and few-shot learning, and its weights are publicly available for further research and development. However, the work has some limitations. Llama Guard's common sense knowledge is constrained by its training data, which can lead to incorrect judgments. Its performance in non-English languages is not guaranteed, and the quality of fine-tuning labels may not comprehensively cover all policy aspects, leading to subpar performance in some cases. -
▶ Humans vs. Machines in Malware Classification (USENIX, 2023) (Discussion) This study collects and analyzes data from human participants through malware classification games, clearly revealing the differences between human and machine learning models. This is an original approach not seen in previous studies. However, the number of samples used in the experiment is limited to 20, which may be somewhat insufficient to represent the overall malware classification problem. Nevertheless, their results provide practical insights that can be applied directly to training malware analysis experts and improving ML models. For example, they propose integrating dynamic behavior analysis of human experts into ML models. This work provides important insights into how human experts and ML models classify malware, and suggests that a hybrid approach that incorporates human intuition and experience into ML models could be highly effective in future malware defense. -
▶ Anomaly Detection in Aerial Videos With Transformers (IEEE GRSS, 2022) (Discussion) Let there be a video of a vehicle moving backward. An anomaly detection by frame alone would fail to capture any anomalies; instead, the model requires a sequence of frames to identify the anomaly. This paper proposes a Transformer-based anomaly detection in aerial videos from an Unmanned Aerial Vehicles (UAVs). By leveraging the Transformer encoder, they claim that the model can preserve spatiotemporal information. The main contribution comprises an annotated dataset for realistic anomalous events, a benchmark for anomaly detection, and a baseline model ANDT. ANDT exhibits the best performance for certain scenes but does not do so for all scenes tested in the paper. Initial claims suggest that previous models do not preserve spatiotemporal information. However, previous models demonstrate decent performance on the provided dataset. In Figure 4, the MemAE model can infer the vehicle moving backward as an anomalous event, demonstrating its ability to preserve spatiotemporal information. Furthermore, the train test split in Table 1 is odd yet there is no mention why it was done in this way. For example, the test set is larger than the train set in the Bike roundabout scene. -
▶ Ahoy SAILR! There is No Need to DREAM of C: A Compiler-Aware Structuring Algorithm for Binary Decompilation (USENIX, 2024) (Discussion) In many ways, the compilation process is destructive. High-level constructs, variable names, and comments present in the source code are often lost. Nevertheless, decompilers aim to retrieve the source from a binary. Decompilation results vary between decompilers. So, what is a good decompilation? One could argue that fewer goto instructions indicate good decompilation. Because multiple gotos signify the decompiler failed to structure the control flow. This paper argues that good decompilation is similar to the source code and points out that 3,754 goto instructions are present in the Linux kernel. Since a goto instruction could be intentional by the developer, they argue we should separate intended gotos from unintended gotos. The authors investigate the gcc compiler to identify the transformation passes that cause unintended gotos. Afterward, they propose a structuring algorithm that deoptimizes code to remove unintended gotos while preserving intended ones. SAILR, the proposed structuring algorithm, is implemented on the angr decompiler. SAILR demonstrates decent performence against state-of-the-art decompilers. The paper evaluates SAILR with 7,355 functions from 26 popular Debian packages. However, the number of functions is oddly low, considering the number of packages used. -
▶ DETECTING PRETRAINING DATA FROM LARGE LANGUAGE MODELS (ICLR, 2024) (Discussion) This paper proposes the Membership Inference method 'MIN-K% PROB' in LLM(Large Language Model). The problem presented in this paper is the challenge of detecting pretraining data. This is because LLM developers do not open the data used for pertaining, and since pretraining involves training on data instances once at a time, it makes detection even more difficult. Hence, this paper assumes that the Membership Inference Detector cannot know the distribution of pretraining data. This means that there is no Reference Model (e.g., Shadow Model) used in Membership Inference techniques. Consequently, this paper proposes the Reference-free Method, MIN-K% PROB. The 'Min-K% PROB' method selects outlier tokens with low token probability to create a set, and then uses the average of this set as the threshold for inferring between members and non-members. This method seems to be efficient as it infers membership based on the token-level probability during the pretraining. Additionally, using this method as an evaluation for Unlearning techniques, as demonstrated in the case study, would be beneficial. -
▶ You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content (S&P, 2024) (Discussion) The paper investigates the application of prompt learning with Large Language Models (LLMs) such as GPT-3 and T5 to address the issue of toxic online content. It focuses on three primary tasks: toxicity classification, toxic span detection, and detoxification, showing that prompt learning can perform as well as or even better than traditional models specifically trained for these tasks. Notably, this approach achieves a 10% improvement in toxicity classification and significantly lowers toxicity scores in detoxification tasks while maintaining the semantic integrity of the content. The study is distinguished by its innovative use of prompt tuning for managing toxic content and its thorough evaluation across five model architectures and eight different datasets, which verifies the method's effectiveness and efficiency. However, despite these strengths, the approach's dependency on the quality of datasets and its ability to generalize to unseen toxic content remain potential weaknesses. Furthermore, the complexity involved in designing effective prompts and the possible misuse of the techniques are also concerns. -
▶ ANUBIS: a provenance graph-based framework for advanced persistent threat detection (SAC, 2022) (Discussion) This paper proposes 'ANUBIS', a provenance graph-based supervised APT detection framework. ANUBIS is a method leveraging a BNN(Bayesian Neural Network) to overcome limitations in current APT detection methods using provenance graphs. The authors hypothesize that an attacker cannot breach the probability graph used to train ANUBIS, nor does it violate the event logging mechanism of the host system. These assumptions provide a strong premise for security scenarios, but they have limitations that do not take into account the practical considerations. However, this paper is significant because it has demonstrated that it is effective in predicting the nature of the activity by introducing a new graph neighbor encoding method. -
▶ The Circle Of Life: A Large-Scale Study of The IoT Malware Lifecycle (USENIX, 2021) (Discussion) This paper is a measurement study on the IoT Malware Lifecycle. It collects around 166K Linux-based IoT malware samples collected over a year to measure the characteristics of IoT Malware. The paper concludes with some characteristics that differentiate IoT malware from traditional malware, such as most IoT malware being a variant of the Mirai botnet. However, there does not seem to be unexpected findings. A slight complaint is that figures within the paper had unclear captions. Although the paper explains the concepts, some figures are difficult to understand. -
▶ How Machine Learning Is Solving the Binary Function Similarity Problem (USENIX, 2022) (Discussion) In this paper, a dataset composition for testing the latest BCSD (Binary Code Similarity Detection) techniques is proposed, and analysis of test results for various BCSD models, including the latest GNN-based BCSD and Embedding-based BCSD, has been performed. However, in the evaluation section, it is necessary to divide the results for similarity and similarity lacking into separate tables to enhance clarity. Additionally, there is a lack of explanation for the many abbreviations used in the dataset composition, making it difficult to understand. -
▶ Quark: Controllable Text Generation with Reinforced [Un]learning (NeurIPS, 2022) (Discussion) The paper discussion focuses on an innovative approach to optimizing the reward function of a Reinforcement Learning (RL) model to mitigate undesired behaviors. The authors detail a three-stage algorithm comprising exploration, quantization, and learning, each critical to the model's development. Notably, the paper demonstrates the model's effectiveness through three distinct evaluations, addressing the reduction of toxicity, improvement over negative baselines, and minimization of repetitive actions. This structured evaluation underscores the model's capability to unlearn specific unwanted behaviors, which outperforms both baselines and state-of-the-art RL methods. In this paper, the focus is on defining three undesirable behaviors and proposing three methods (e.g. reward function) to forget each behavior. It seems inefficient to train a model separately for each behavior. -
▶ A Comprehensive Detection Method for the Lateral Movement Stage of APT Attacks (IEEE IoT-J, 2023) (Discussion) This paper designs a multidimensional detection framework to detect lateral movement behavior against APT attacks in an intranet environment based on the SMB protocol. However, contrary to this purpose, the experimental results are unfortunate in that the purpose and results differ because they are about how well malware has been classified. -
▶ Anomaly detection in a forensic timeline with deep autoencoders (JISA, 2021) (Discussion) Systems generate logs to record internal events. These logs are used after a cyber incident for forensic investigation. Unfortunately, the system generates numerous logs during its runtime and majority of the analysis is manual. This paper proposes a deep autoencoder for anomaly detection to assist analyzers by highlighting anomalous events in the Linux kernal system logs. Anomaly detection is a common application for autoencoders. It is utilized to analyze network traffic, logs, etc. It is difficult to identify what is different from previous work and this paper. Furthermore, the proposed model is evaluated with old dataset against ML methods such as SVM. It would have been preferable to see the model’s performance on up-to-date dataset against state-of-the-art models. -
▶ Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures (ASIA CCS, 2023) (Discussion) There are efficient ways to represent binary functions using a tokenizer (e.g., BPE) in assembly language. I'm curious about why the suggestion is to divide them into seven features like Vex-IR instructions, LibcCalls, Constants, etc., and create specific tokenizers for each feature to generate one-hot vectors. I also wonder if this approach is genuinely effective. The explanation about whether this method of tokenization has a positive impact on BCSD (Binary Code Similarity Detection) performance seems insufficient. -
▶ Learning to Unlearn: Instance-wise Unlearning for Pre-trained Classifiers (AAAI, 2024) (Discussion) The paper utilizes adversarial examples to retain the original decision boundary during unlearning. This approach is intriguing yet it raises questions as to what it means to “forget” a data point. The paper states that misclassification signifies forgetting yet “Right to be Forgotten” seems to imply the removal of a certain training data. -
▶ From Grim Reality to Practical Solution: Malware Classification in Real-World Noise (S&P, 2023) Malware datasets inevitably contain incorrect labels due to the shortage of expertise and experience needed for sample labeling. Previous research demonstrated that a training dataset with incorrectly labeled samples would result in inaccurate model learning. To address this problem, researchers have proposed various noise learning methods to offset the impact of incorrectly labeled samples, and in image recognition and text mining applications, these methods demonstrated great success. In this work, we apply both representative and state-of-the-art noise learning methods to real-world malware classification tasks. We surprisingly observe that none of the existing methods could minimize incorrect labels’ impact. Through a carefully designed experiment, we discover that the inefficacy mainly results from extreme data imbalance and the high percentage of incorrectly labeled data samples. As such, we further propose a new noise learning method and name it after MORSE. Unlike existing methods, MORSE customizes and extends a state-of-the-art semi-supervised learning technique. It takes possibly incorrectly labeled data as unlabeled data and thus avoids their potential negative impact on model learning. In MORSE, we also integrate a sample re-weighting method that balances the training data usage in the model learning and thus handles the data imbalance challenge. We evaluate MORSE on both our synthesized and real-world datasets. We show that MORSE could significantly outperform existing noise learning methods and minimize the impact of incorrectly labeled data. -
▶ CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation (EMNLP, 2021) (Abstract) Pre-trained models for Natural Languages (NL) like BERT and GPT have been recently shown to transfer well to Programming Languages (PL) and largely benefit a broad set of code-related tasks. Despite their success, most current methods either rely on an encoder-only (or decoder-only) pre-training that is suboptimal for generation (resp. understanding) tasks or process the code snippet in the same way as NL, neglecting the special characteristics of PL such as token types. We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers. Our model employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning. Besides, we propose a novel identifier-aware pre-training task that enables the model to distinguish which code tokens are identifiers and to recover them when they are masked. Furthermore, we propose to exploit the user-written code comments with a bimodal dual generation task for better NL-PL alignment. Comprehensive experiments show that CodeT5 significantly outperforms prior methods on understanding tasks such as code defect detection and clone detection, and generation tasks across various directions including PL-NL, NL-PL, and PL-PL. Further analysis reveals that our model can better capture semantic information from code. Our code and pre-trained models are released at https: //github.com/salesforce/CodeT5. -
▶ InCoder: A Generative Model for Code Infilling and Synthesis (EMNLP, 2021) (Abstract) Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce INCODER, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via masking and infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first large generative code model that is able to infill arbitrary regions of code, which we evaluate in a zero-shot setting on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. Our models and code are publicly released. -
▶ CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (NeurIPS, 2022) (Abstract) Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical limitations. In particular, they often follow a standard supervised fine-tuning procedure to train a code generation model only from the pairs of natural-language problem descriptions and ground-truth programs. Such paradigm largely ignores some important but potentially useful signals in the problem specification such as unit tests, which thus often results in poor performance when solving complex unseen coding tasks. To address the limitations, we propose “CodeRL”, a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL). Specifically, during training, we treat the code-generating LM as an actor network, and introduce a critic network that is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor. During inference, we introduce a new generation procedure with a critical sampling strategy that allows a model to automatically regenerate programs based on feedback from example unit tests and critic scores. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives, larger model sizes, and better pretraining data. Our method not only achieves new SOTA results on the challenging APPS benchmark, but also shows strong zero-shot transfer capability with new SOTA results on the simpler MBPP benchmark. -
▶ Ground Truth for Binary Disassembly is Not Easy (USENIX, 2022) (Abstract) Modern disassembly tools often rely on empirical evaluations to validate their performance and discover their limitations, thus promoting long-term evolvement. To support the empirical evaluation, a foundation is the right approach to collect the ground truth knowledge. However, there has been no unanimous agreement on the approach we should use. Most users pick an approach based on their experience or will, regardless of the properties that the approach presents. In this paper, we perform a study on the approaches to building the ground truth for binary disassembly, aiming to shed light on the right way for the future. We first provide a taxonomy of the approaches used by past research, which unveils five major mechanisms behind those approaches. Following the taxonomy, we summarize the properties of the five mechanisms from two perspectives: (i) the coverage and precision of the ground truth produced by the mechanisms and (ii) the applicable scope of the mechanisms (e.g., what disassembly tasks and what types of binaries are supported). The summarization, accompanied by quantitative evaluations, illustrates that many mechanisms are ill-suited to support the generation of disassembly ground truth. The mechanism best serving today’s need is to trace the compiling process of the target binaries to collect the ground truth information. Observing that the existing tool to trace the compiling process can still miss ground truth results and can only handle x86/x64 binaries, we extend the tool to avoid overlooking those results and support ARM32/AArch64/MIPS32/MIPS64 binaries. We envision that our extension will make the tool a better foundation to enable universal, standard ground truth for binary disassembly. -
▶ Black-box Attacks Against Neural Binary Function Detection (RAID, 2023) (Abstract) Binary analyses based on deep neural networks (DNNs), or neural binary analyses (NBAs), have become a hotly researched topic in recent years. DNNs have been wildly successful at pushing the performance and accuracy envelopes in the natural language and image processing domains. Thus, DNNs are highly promising for solving binary analysis problems that are hard due to a lack of complete information resulting from the lossy compilation process. Despite this promise, it is unclear that the prevailing strategy of repurposing embeddings and model architectures originally developed for other problem domains is sound given the adversarial contexts under which binary analysis often operates. In this paper, we empirically demonstrate that the current state of the art in neural function boundary detection is vulnerable to both inadvertent and deliberate adversarial attacks. We proceed from the insight that current generation NBAs are built upon embeddings and model architectures intended to solve syntactic problems. We devise a simple, reproducible, and scalable black-box methodology for exploring the space of inadvertent attacks – instruction sequences that could be emitted by common compiler toolchains and configurations – that exploits this syntactic design focus. We then show that these inadvertent misclassifications can be exploited by an attacker, serving as the basis for a highly effective black-box adversarial example generation process. We evaluate this methodology against two state-of-the-art neural function boundary detectors: XDA and DeepDi. We conclude with an analysis of the evaluation data and recommendations for ho -
▶ SelectiveTaint: Efficient Data Flow Tracking With Static Binary Rewriting (USENIX, 2021) (Abstract) Taint analysis has been widely used in many security applications such as exploit detection, information flow tracking, malware analysis, and protocol reverse engineering. State-of-theart taint analysis tools are usually built atop dynamic binary instrumentation, which instruments at every possible instruction, and rely on runtime information to decide whether a particular instruction involves taint or not, thereby usually having high performance overhead. This paper presents SELECTIVETAINT, an efficient selective taint analysis framework for binary executables. The key idea is to selectively instrument the instructions involving taint analysis using static binary rewriting instead of dynamic binary instrumentation. At a high level, SELECTIVETAINT statically scans taint sources of interest in the binary code, leverages value set analysis to conservatively determine whether an instruction operand needs to be tainted or not, and then selectively taints the instructions of interest. We have implemented SELECTIVETAINT and evaluated it with a set of binary programs including 16 coreutils (focusing on file I/O) and five network daemon programs (focusing on network I/O) such as nginx web server. Our evaluation results show that the binaries statically instrumented by SELECTIVETAINT has superior performance compared to the state-of-the-art dynamic taint analysis frameworks (e.g., 1.7x faster than that of libdft).