Data Visualization of Telegram messages (Encrypted Chats)

We are going to visualize the timestamps of messages in the Telegram database. This also includes encrypted chats as we analyze the SQLite database of the app.

Obtaining the database

We pull the database of Telegram using the ADB tool. You can read here how this tool works and how to set it up. Make sure your phone is rooted and you set Root access to ADB only. Then you can restart ADB using adb root. Finally you can pull the database to your current working directory using:

1adb pull /data/data/org.telegram.messenger/files/cache4.db

Collecting timestamp information

Using the sqlite3 tool we can get data and output it as CSV file.

1sqlite3 cache4.db -csv -header "SELECT 1;" > timestamps.csv

The timestamps.csv should contain now a single 1. The following queries show how to query the timestamps for encrypted and non-encrypted chats.

Non-Encrypted Chat

For non-encrypted chats you can use:

1SELECT date FROM messages WHERE uid = (
2    SELECT uid FROM users WHERE name LIKE '%lower case name of person%'
3);

Encrypted chat

For encrypted chat we first need to query an other uid. The new uid is longer as it is bit-shifted by 64 to the left. In order to find the chat messages for encrypted chats, we first undo this in the query and then find the corresponding user in the enc_chats table.

1SELECT date FROM messages
2    WHERE mid < 0 AND substr(printf('%X', uid), 0, length(printf('%X', uid)) - 7) = (
3        SELECT printf('%X', uid) FROM enc_chats WHERE name LIKE '%lower case name of person%'
4    );

There are also the data-blobs with column name data which are not further discussed here. Maybe an other post will take a deeper look at the actual message content. We can UNION and sort the results now to get ta complete overview of the timestamps.

TL;DR (show me the command!)

 1sqlite3 cache4.db -csv -header "
 2SELECT * FROM (
 3    SELECT date FROM messages WHERE uid = (
 4        SELECT uid FROM users WHERE name LIKE '%lower case name of person%'
 5    )
 6    UNION
 7    SELECT date FROM messages
 8        WHERE mid < 0 AND substr(printf('%X', uid), 0, length(printf('%X', uid)) - 7) = (
 9            SELECT printf('%X', uid) FROM enc_chats WHERE name LIKE '%lower case name of person%'
10        )
11) ORDER BY date;
12" > timestamps.csv

Creating a Data Visualization

The following script allows you to plot the data using pandas and matplotlib with the kxcd style:

 1import pandas as pd
 2import matplotlib
 3import matplotlib.pyplot as plt
 4import matplotlib.ticker as mticker
 5from matplotlib import patheffects
 6
 7#import numpy
 8#df = pd.DataFrame(numpy.random.randint(1546300800, 1576368000, size=(2000, 1)), columns=['date'])
 9df = pd.read_csv('timestamps.csv')
10df["date"] = df["date"].astype("datetime64[s]")
11
12df = df.groupby(df["date"].dt.month).count()
13print("Chat messages per month:")
14print(df)
15df = df.drop(df.tail(1).index)
16
17plt.xkcd()
18matplotlib.rcParams['path.effects'] = [patheffects.withStroke(linewidth=0)]
19matplotlib.rcParams['font.family'] = 'xkcd'
20
21fig, ax = plt.subplots(figsize=(8, 5))
22
23df.plot(kind="bar", ax=ax, legend=False)
24
25ax.set_xlabel('')
26ax.set_ylabel('')
27
28ax.spines['right'].set_visible(False)
29ax.spines['top'].set_visible(False)
30for item in [fig, ax]:
31    item.patch.set_visible(False)
32
33ax.xaxis.set_ticks_position('bottom')
34ax.xaxis.set_major_formatter(mticker.FixedFormatter(
35    ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']))
36ax.tick_params(axis=u'both', which=u'both', length=0)
37plt.xticks(rotation='horizontal')
38
39ax.set_title("CHAT MESSAGES PER MONTH")
40
41fig.savefig('timestamps.png', dpi=600, transparent=True)

You can also annotate special points using plt.annotate and plt.text:

1plt.annotate(
2    '',
3    xy=(7.5, 250), arrowprops=dict(arrowstyle='simple', fc='black'), xytext=(6.5, 250), annotation_clip=False)
4
5plt.text(6.5, 269, 'FUTURE', fontsize=16)

Here is an example output:

Example visualization from January until December

The data was generated randomly.

More Resources about Telegram Reverse-engineering

You can find more information on the dflab blog about reversing the Telegram database. Thanks for sharing the knowledge! The following section was the key to this post:

if “mid” is negative, “uid” is negative or positive and has a length of about 19 characters[6] – the interlocutor’s data can be find in the “Enc_chats” table by the converted “uid” value. To find the appropriate contact in the “Enc_chats” table, you need to convert the decimal “uid” number to hexadecimal, then cut off the last 8 zeros from the received number and convert the eight-digit hexadecimal number back to decimal;

Do you have questions? Send an email to max@maxammann.org