JSON Interoperability Vulnerabilities: A Deep Dive

Hey everyone,

JSON (JavaScript Object Notation) has become the standard for data interchange on the web, and like many of you, I use it daily. Its simplicity and human-readable format are great, but this simplicity can be deceiving. I’ve been digging into how subtle differences in how JSON is parsed can lead to serious security vulnerabilities, especially in today’s complex, polyglot microservice architectures. It’s a fascinating area, and I wanted to share some of my findings.

The Problem: JSON’s “Simplicity” is a Lie!

JSON’s apparent simplicity masks a surprising lack of strict standardization. While RFC 7159 provides a general guideline, it leaves room for interpretation, especially when it comes to handling edge cases. This has led to the development of numerous JSON parsers, each with its own quirks and idiosyncrasies. I’ve lost count of the number of times I’ve seen weird parsing behavior in different languages.

In a modern application, a single JSON payload might be processed by multiple services, each using a different JSON parser. For example:

A request from a mobile app (using the platform’s built-in JSON parser) might be routed through a Node.js API gateway (using JSON.parse).
The API gateway might forward the data to a backend service written in Python (using json.loads).
The Python service might store the data in a database that uses its own internal JSON parser. Think about how many different parsers a single request might go through!

If these parsers interpret the same JSON payload differently, it can lead to inconsistencies and vulnerabilities. And that’s where things get interesting (and potentially dangerous).

Categories of JSON Interoperability Vulnerabilities

I’ve been categorizing JSON interoperability vulnerabilities into several key areas based on my research and bug bounty experiences:

1. Inconsistent Handling of Duplicate Keys

The JSON specification states that the behavior of duplicate keys in a JSON object is “undefined.” This is a big problem! This means that different parsers can handle them in different ways. Some parsers might:

Take the first occurrence of the key.
Take the last occurrence of the key.
Combine the values into an array (if the values are compatible).
Throw an error.
Silently ignore subsequent duplicates.

Example:

Consider the following JSON payload:

{
  "user": "alice",
  "user": "bob"
}

One parser might interpret this as {“user”: “alice”}, while another might interpret it as {“user”: “bob”}. It’s a recipe for disaster!

Vulnerability Scenario:

Imagine a scenario where a user sends a request to create an account with the following JSON payload:

{
  "username": "alice",
  "username": "admin",
  "role": "user"
}

If the first parser in the chain takes the first occurrence of the username key, the user will be created with the username “alice”. However, if a subsequent parser takes the last occurrence, the user might be created with the username “admin” but the role “user” — a classic privilege escalation! I’ve actually found this in the wild.

Here’s a Python example to illustrate the different behaviors:

import json

# Example 1: Last value taken (Python's default)
json_data1 = '{"user": "alice", "user": "bob"}'
data1 = json.loads(json_data1)
print(f"Python Example 1: {data1}")  # Output: {'user': 'bob'}

# Example 2:  Simulating a "first value taken" behavior (not built-in, need to process manually)
def first_duplicate_wins(json_string):
    data = {}
    import re
    for match in re.finditer(r'"([^"]+)":\s*("[^"]*"|[^,}]+)', json_string):
        key, value = match.groups()
        if key not in data:
            try:
                data[key] = json.loads(value) #handle nested JSON
            except json.JSONDecodeError:
                data[key] = value
    return data

data2 = first_duplicate_wins(json_data1)
print(f"Python Example 2 (Simulated 'first' behavior): {data2}")

2. Key Collision: Character Truncation and Comments

Some JSON parsers exhibit unexpected behavior when encountering certain characters or syntax within keys, such as truncation or misinterpretation of comments.

Character Truncation: Certain parsers might truncate keys after encountering specific characters, leading to unintended key collisions. Null bytes are a classic example.

Misinterpreted Comments: While JSON itself doesn’t support comments, some parsers might allow them as an extension. However, inconsistencies in how these comments are handled can lead to vulnerabilities. For instance, a parser might misinterpret a comment as part of a key, leading to unexpected behavior. This is less common, but I’ve seen it in some older or less common parsers.

Vulnerability Scenario:

Consider a JSON parser that truncates keys after encountering a null byte (\0). An attacker could send a payload like this:

{
  "admin\0": "true",
  "user": "alice"
}

If the parser truncates the key, it might store the value as if the key was simply “admin”. If another part of the system checks for the “admin” key, this could lead to an unauthorized privilege escalation. This is a fun one to test for.

Here’s how you might test for null byte truncation in Python (note: Python’s json.loads doesn’t truncate, but other libraries or custom parsers might):

import json

def check_null_byte_truncation(key):
    test_json = f'{{"{key}": "test"}}'
    try:
        data = json.loads(test_json)
        # Check if the key in the parsed object is different from the original
        if list(data.keys())[0] != key:
            print(f"Potential truncation with key: {key}")
            print(f"Parsed as: {list(data.keys())[0]}")
        else:
            print(f"No truncation detected with key: {key}")
    except json.JSONDecodeError as e:
        print(f"JSONDecodeError with key: {key} - {e}")

# Test with a null byte
check_null_byte_truncation("admin\0")
check_null_byte_truncation("admin")

3. JSON Serialization Quirks

JSON serialization is the process of converting data structures (like objects or arrays) into a JSON string. Different programming languages and libraries might implement serialization differently, leading to inconsistencies. This is where you really need to pay attention to the specific libraries being used.

Example:

Consider how different languages might serialize a dictionary/map with non-string keys:

JavaScript: Coerces keys to strings. For example, {1: “value”} becomes {“1”: “value”}.
Python: Requires keys to be strings (or types that can be reliably converted to strings). You’ll get a TypeError if you try to serialize a dict with a non-string key directly.
Some Libraries: Might allow other data types (e.g., integers) as keys, which can lead to issues when deserialized by a parser that expects only string keys.

Vulnerability Scenario:

If a backend system serializes a JSON object with integer keys, and a frontend system deserializes it with a parser that expects string keys, the frontend might not be able to access the data correctly, potentially leading to errors or denial of service.

Here’s a Python and Javascript example:

import json

# Python
data_python = {1: "value"}
try:
    json_string_python = json.dumps(data_python)
    print(f"Python Serialized: {json_string_python}")
except TypeError as e:
    print(f"Python TypeError: {e}") #error
```html
<script>
  // JavaScript
  const data_js = { 1: "value" };
  const json_string_js = JSON.stringify(data_js);
  console.log(`JavaScript Serialized: ${json_string_js}`); //Output: {"1":"value"}

  const parsed_js = JSON.parse(json_string_js);
  console.log(`JavaScript Parsed:`, parsed_js);
  console.log(`Type of key '1':`, typeof parsed_js["1"]); //string
</script>

4. Float and Integer Representation

While JSON supports numbers, there can be subtle differences in how floating-point numbers and integers are represented and parsed across different systems. This can lead to rounding errors or unexpected behavior when converting between different representations. This is a classic source of bugs, especially when dealing with financial data.

Example:

Different languages may use different levels of precision when representing floating-point numbers. This can lead to slight variations in the serialized JSON, which, while seemingly innocuous, can cause issues in systems that rely on exact matching.

Vulnerability Scenario:

Consider a financial application that uses JSON to exchange transaction data. If different systems use different levels of precision for representing currency amounts, it could lead to discrepancies in the amounts processed, potentially resulting in financial losses or fraud.

Here’s a Python example:

import json

# Example: Floating-point precision
number1 = 1.000000001
number2 = 1.000000002
json_string1 = json.dumps({"num": number1})
json_string2 = json.dumps({"num": number2})
print(f"JSON String 1: {json_string1}")
print(f"JSON String 2: {json_string2}")

data1 = json.loads(json_string1)
data2 = json.loads(json_string2)

if data1 == data2:
    print("Numbers are equal after serialization/deserialization")
else:
    print("Numbers are different after serialization/deserialization")

5. Permissive Parsing and Other Bugs

Some JSON parsers are more “permissive” than others, meaning they might accept JSON that doesn’t strictly adhere to the standard. This can include:

Accepting unquoted keys.
Allowing trailing commas.
Tolerating syntax errors.

While permissive parsing can be convenient in some cases, it can also create security vulnerabilities. It’s a trade-off, and in security, strictness is generally better.

Example:

A parser that allows trailing commas might interpret the following JSON as having two keys:

{
  "user": "alice",
  "role": "user",
}

If another parser in the chain is stricter and interprets this as having only two keys, but a different ordering, it could lead to inconsistencies.

Vulnerability Scenario:

If a parser allows unquoted keys, an attacker might be able to inject unexpected characters into keys, potentially bypassing input validation or exploiting other vulnerabilities. For example, if a system expects a key to be “id”, an attacker might send {“id\n”: 123}, and if the newline character is not properly handled, it could lead to log injection.

Here’s a Javascript example:

<script>
    // Example of permissive parsing with unquoted keys (not standard JSON)
    try {
      const jsonString = "{id: 123}";  // Unquoted key 'id'
      const parsedData = JSON.parse(jsonString);
      console.log("Parsed data (permissive parser behavior):", parsedData);
    } catch (error) {
      console.error("Error parsing JSON:", error); // This will throw an error in standard JSON
    }

    // Example of trailing comma
    try{
        const jsonStringWithComma = '{"key1": "value1", "key2": "value2",}';
        const parsedDataComma = JSON.parse(jsonStringWithComma);
        console.log("Parsed data with trailing comma:", parsedDataComma);
    } catch(error){
        console.error("Error parsing JSON with trailing comma", error);
    }
</script>

Real-World Examples and Case Studies

While it’s tricky to point to specific, publicly disclosed exploits solely caused by these JSON quirks (they’re often part of a bigger attack), the underlying issues are real. I’ve seen variations of these in bug bounties and internal security audits. Input validation issues, which are super common, often involve JSON.

How to Find These Vulnerabilities in Bug Bounties

Okay, this is the part you’re probably most interested in: how to find these vulnerabilities in bug bounty programs. Here’s my strategy:

Identify JSON Endpoints: The first step is to find endpoints that accept JSON input. Look for API endpoints, forms that submit data as JSON, and any other place where the application processes JSON. Burp Suite is your friend here. Pay close attention to the Content-Type header.
Fuzz with Modified JSON: This is where the fun begins. Start sending modified JSON payloads to these endpoints. Here are some techniques I use:

Duplicate Keys: Send payloads with duplicate keys in different orders. See if you can cause inconsistencies in how the data is processed. For example, send {“user”: “alice”, “user”: “admin”} and then {“user”: “admin”, “user”: “alice”}.
Character Truncation: Inject null bytes (\0) and other special characters into keys to test for truncation vulnerabilities. Try {“admin\0”: “true”}.
Invalid Types: Send values of unexpected types. For example, if an endpoint expects an integer, send a string or a float.
Number Precision: Send very large or very small floating-point numbers to test for precision issues.
Permissive Parsing: Send JSON with unquoted keys, trailing commas, and other syntax errors to see if the parser is too lenient.

Observe the Behavior: Carefully observe how the application responds to your modified JSON payloads. Look for:

Errors or crashes.
Inconsistent data processing.
Unexpected behavior, such as privilege escalation or data corruption.
Differences in behavior between different parts of the application. For example, does the API gateway handle your payload differently than the backend service?

Use Automation: For large applications, it’s helpful to automate the process of fuzzing JSON input. Tools like Burp Suite Intruder or custom scripts can be used to send a large number of modified JSON payloads automatically. I often write quick Python scripts to generate payloads.
Check Different Parsers: If you can, try to identify the specific JSON parsers being used by the application. This can help you tailor your attacks to the specific quirks of those parsers. Sometimes, error messages will leak this information. For example, a Python traceback might reveal that json.loads is being used.
Look for Chained Vulnerabilities: JSON vulnerabilities are often part of a chain. For example, a JSON parsing vulnerability might allow you to inject data into a database, which could then be used to exploit a SQL injection vulnerability.

Mitigation Strategies

So, what can we do to defend against these attacks? Here are some best practices that I recommend:

Strict Parsing: Use JSON parsers that adhere strictly to the JSON standard and avoid those that are overly permissive. This is your first line of defense.
Standardized Libraries: Use well-vetted, widely used JSON parsing libraries that are actively maintained and have a good security track record. Don’t roll your own parser unless absolutely necessary (and even then, be extremely careful).
Input Validation: Always validate and sanitize JSON input on both the client and the server side. Don’t rely on the parser alone to enforce data integrity. Define a schema (e.g., using JSON Schema) and validate the structure and data types of the JSON. This is the most important thing you can do.
Canonicalization: Where possible, canonicalize JSON before processing it. This means converting it to a standard, consistent format to minimize differences between parsers. For example, ensure keys are consistently ordered, and that floating-point numbers are represented with a consistent level of precision.
Test Thoroughly: Test your application with a variety of JSON payloads, including edge cases and potentially malicious inputs, to ensure that different parsers handle them consistently. Use fuzzing tools to automatically generate a wide range of test cases.
Be Aware of Parser Behavior: Understand the specific behavior of the JSON parsers used in your application, especially when it comes to handling duplicate keys, edge cases, and non-standard JSON. Consult the parser’s documentation and test its behavior.
Consider Alternative Formats (If Appropriate): In some cases, if you have control over both the sending and receiving ends of the communication, consider using a more strictly defined data format like Protocol Buffers or Avro, which offer stronger guarantees about data consistency and schema enforcement. However, JSON remains highly practical for many use cases.

Important Notes for Bug Bounties:

Scope: Always stay within the scope of the bug bounty program. Don’t test endpoints or systems that are not explicitly listed in the program’s scope.
Responsible Disclosure: If you find a vulnerability, report it to the vendor in accordance with the program’s guidelines. Don’t disclose the vulnerability publicly until the vendor has had a chance to fix it.
Documentation: Document your findings carefully. Include detailed steps to reproduce the vulnerability, as well as any evidence that you have gathered. This will make it easier for the vendor to understand and fix the vulnerability, and it will also increase your chances of getting a good bounty.

Conclusion

JSON’s ubiquity and apparent simplicity make it easy to overlook the potential security risks associated with its interoperability. As applications become more complex and distributed, it’s crucial to be aware of the subtle differences in how JSON is parsed and handled by different systems.

By following the mitigation strategies outlined in this blog post, developers can significantly reduce the risk of JSON interoperability vulnerabilities and build more secure and robust applications. The key takeaway? Treat JSON parsing as a security-sensitive operation.

Validate everything. And happy bug hunting!