<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[Python Tips]]></provider_name><provider_url><![CDATA[http://pythontips.com]]></provider_url><author_name><![CDATA[phihag]]></author_name><author_url><![CDATA[https://pythontips.com/author/phihag/]]></author_url><title><![CDATA[Storing and Loading Data with&nbsp;JSON]]></title><type><![CDATA[link]]></type><html><![CDATA[<p>We&#8217;ve already <a href="https://freepythontips.wordpress.com/2013/08/02/what-is-pickle-in-python/">learned about pickle</a>, so why do we need another way to (de)serialize Python objects to(from) disk or a network connection? There are three major reasons to prefer JSON over <!--more-->pickle:</p>
<ul>
<li>When you&#8217;re unpickling data, you&#8217;re essentially allowing your data source to <a href="http://blog.nelhage.com/2011/03/exploiting-pickle/">execute arbitrary Python commands</a>. If the data is trustworthy (say stored in a sufficiently protected directory), that may not be a problem, but it&#8217;s often really easy to accidentally leave a file unprotected (or read something from network). In these cases, you want to load data, and not execute potentially malicious Python code!</li>
<li>Pickled data is not easy to read, and virtually impossible to write for humans. For example, the pickled version of <code>{"answer": [42]}</code> looks like this:
<pre><code>(dp0
S'answer'
p1
(lp2
I42
as.
</code></pre>
</li>
</ul>
<p>In contrast, the JSON representation of <code>{"answer": [42]}</code> is &#8230;. <code>{"answer": [42]}</code>. If you can read Python, you can read JSON; since <em>all</em> JSON is valid Python code!</p>
<ul>
<li>Pickle is Python-specific. In fact, by default, the bytes generated by Python 3&#8217;s pickle cannot be read by a Python 2.x application! JSON can be read by virtually any programming language &#8211; just scroll down on the <a href="http://json.org/">official homepage</a> to see <a href="http://jackson.codehaus.org/">implementations</a> <a href="http://james.newtonking.com/pages/json-net.aspx">in</a> <a href="http://www.ecma-international.org/publications/standards/Ecma-262.htm">all</a> <a href="http://docs.python.org/library/json.html">major</a> <a href="http://www.php.net/releases/5_2_0.php">and</a> <a href="http://json.rubyforge.org/">some</a> <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/json">minor</a> <a href="http://golang.org/pkg/encoding/json/">languages</a>.</li>
</ul>
<p>So how do you get the JSON representation of an object? It&#8217;s simple, just call <a href="http://docs.python.org/dev/library/json.html#json.dumps"><code>json.dumps</code></a>:</p>
<pre><code>import json
obj = {u"answer": [42.2], u"abs": 42}
print(json.dumps(obj))
# output:  {"answer": [42.2], "abs": 42}
</code></pre>
<p>Often, you want to write to a file or network stream. In both Python 2.x and 3.x you can call <a href="http://docs.python.org/dev/library/json.html#json.dump"><code>dump</code></a> to do that, but in 3.x the output must be a character stream, whereas 2.x expects a byte stream.</p>
<p>Let&#8217;s look how to load what we wrote. Fittingly, the function to load is called <code>loads</code> (to load from a string) / <code>load</code> (to load from a stream):</p>
<pre><code>import json
obj_json = u'{"answer": [42.2], "abs": 42}'
obj = json.loads(obj_json)
print(repr(obj))
</code></pre>
<p>When the objects we load and store grow larger, we puny humans often need some hints on where a new sub-object starts. To get these, simply pass an indent size, like this:</p>
<pre><code>import json
obj = {u"answer": [42.2], u"abs": 42}
print(json.dumps(obj, indent=4))
</code></pre>
<p>Now, the output will be a beautiful</p>
<pre><code>{
    "abs": 42, 
    "answer": [
        42.2
    ]
}
</code></pre>
<p>I often use this indentation feature to debug complex data structures.</p>
<p>The price of JSON&#8217;s interoperability is that we cannot store arbitrary Python objects. In fact, JSON can only store the following objects:</p>
<ul>
<li>character strings</li>
<li>numbers</li>
<li>booleans (<code>True</code>/<code>False</code>)</li>
<li><code>None</code></li>
<li>lists</li>
<li>dictionaries with character string keys</li>
</ul>
<p>Every object that&#8217;s not one of these must be converted &#8211; that includes <em>every</em> object of a custom class. Say we have an object <code>alice</code> as follows:</p>
<pre><code>class User(object):
    def __init__(self, name, password):
        self.name = name
        self.password = password
alice = User('Alice A. Adams', 'secret')
</code></pre>
<p>then converting this object to JSON will fail:</p>
<pre><code>&gt;&gt;&gt; import json
&gt;&gt;&gt; json.dumps(alice)
Traceback (most recent call last):
  File "&lt;stdin&gt;", line 1, in &lt;module&gt;
  File "/usr/lib/python3.3/json/__init__.py", line 236, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.3/json/encoder.py", line 191, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.3/json/encoder.py", line 249, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.3/json/encoder.py", line 173, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: &lt;__main__.User object at 0x7f2eccc88150&gt; is not JSON serializable
</code></pre>
<p>Fortunately, there is a simple hook for conversion: Simply define a <code>default</code> method.:</p>
<pre><code>def jdefault(o):
    return o.__dict__
print(json.dumps(alice, default=jdefault))
# outputs: {"password": "secret", "name": "Alice A. Adams"}
</code></pre>
<p><code>o.__dict__</code> is a simple catch-all for user-defined objects, but we can also add support for other objects. For example, let&#8217;s add support for <a href="http://docs.python.org/dev/library/stdtypes.html#set">sets</a> by treating them like lists:</p>
<pre><code>def jdefault(o):
    if isinstance(o, set):
        return list(o)
    return o.__dict__

pets = set([u'Tiger', u'Panther', u'Toad'])
print(json.dumps(pets, default=jdefault))
# outputs: ["Tiger", "Panther", "Toad"]
</code></pre>
<p>For more options and details (<code>ensure_ascii</code> and <code>sort_keys</code> may be interesting options to set), have a look at the <a>official documentation for JSON</a>. JSON is available by default in Python 2.6 and newer, before that you can use <a href="https://pypi.python.org/pypi/simplejson/">simplejson</a> <a href="https://freepythontips.wordpress.com/2013/07/30/make-your-programs-compatible-with-python-2-and-3-at-the-same-time/">as a fallback</a>.</p>
<p><b>You might also like :</b><br />
*) <a href="https://freepythontips.wordpress.com/2013/08/07/the-self-variable-in-python-explained/">The self variable in python explained</a><br />
*) <a href="https://freepythontips.wordpress.com/2013/08/06/python-socket-network-programming/">Python socket network programming</a><br />
*) <a href="https://freepythontips.wordpress.com/2013/08/04/args-and-kwargs-in-python-explained/">*args and **kwargs in python explained</a></p>
]]></html></oembed>