We are going to visualize the timestamps of messages in the Telegram database. This also includes encrypted chats as we analyze the SQLite database of the app.
Obtaining the database
We pull the database of Telegram using the ADB tool. You can read here how this tool works and how to set it up. Make sure your phone is rooted and you set Root access
to ADB only
. Then you can restart ADB using adb root
. Finally you can pull the database to your current working directory using:
1adb pull /data/data/org.telegram.messenger/files/cache4.db
Collecting timestamp information
Using the sqlite3 tool we can get data and output it as CSV file.
1sqlite3 cache4.db -csv -header "SELECT 1;" > timestamps.csv
The timestamps.csv
should contain now a single 1. The following queries show how to query the timestamps for encrypted and non-encrypted chats.
Non-Encrypted Chat
For non-encrypted chats you can use:
1SELECT date FROM messages WHERE uid = (
2 SELECT uid FROM users WHERE name LIKE '%lower case name of person%'
3);
Encrypted chat
For encrypted chat we first need to query an other uid
. The new uid
is longer as it is bit-shifted by 64 to the left. In order to find the chat messages for encrypted chats, we first undo this in the query and then find the corresponding user in the enc_chats
table.
1SELECT date FROM messages
2 WHERE mid < 0 AND substr(printf('%X', uid), 0, length(printf('%X', uid)) - 7) = (
3 SELECT printf('%X', uid) FROM enc_chats WHERE name LIKE '%lower case name of person%'
4 );
There are also the data-blobs with column name data
which are not further discussed here. Maybe an other post will take a deeper look at the actual message content.
We can UNION and sort the results now to get ta complete overview of the timestamps.
TL;DR (show me the command!)
1sqlite3 cache4.db -csv -header "
2SELECT * FROM (
3 SELECT date FROM messages WHERE uid = (
4 SELECT uid FROM users WHERE name LIKE '%lower case name of person%'
5 )
6 UNION
7 SELECT date FROM messages
8 WHERE mid < 0 AND substr(printf('%X', uid), 0, length(printf('%X', uid)) - 7) = (
9 SELECT printf('%X', uid) FROM enc_chats WHERE name LIKE '%lower case name of person%'
10 )
11) ORDER BY date;
12" > timestamps.csv
Creating a Data Visualization
The following script allows you to plot the data using pandas and matplotlib with the kxcd style:
1import pandas as pd
2import matplotlib
3import matplotlib.pyplot as plt
4import matplotlib.ticker as mticker
5from matplotlib import patheffects
6
7#import numpy
8#df = pd.DataFrame(numpy.random.randint(1546300800, 1576368000, size=(2000, 1)), columns=['date'])
9df = pd.read_csv('timestamps.csv')
10df["date"] = df["date"].astype("datetime64[s]")
11
12df = df.groupby(df["date"].dt.month).count()
13print("Chat messages per month:")
14print(df)
15df = df.drop(df.tail(1).index)
16
17plt.xkcd()
18matplotlib.rcParams['path.effects'] = [patheffects.withStroke(linewidth=0)]
19matplotlib.rcParams['font.family'] = 'xkcd'
20
21fig, ax = plt.subplots(figsize=(8, 5))
22
23df.plot(kind="bar", ax=ax, legend=False)
24
25ax.set_xlabel('')
26ax.set_ylabel('')
27
28ax.spines['right'].set_visible(False)
29ax.spines['top'].set_visible(False)
30for item in [fig, ax]:
31 item.patch.set_visible(False)
32
33ax.xaxis.set_ticks_position('bottom')
34ax.xaxis.set_major_formatter(mticker.FixedFormatter(
35 ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']))
36ax.tick_params(axis=u'both', which=u'both', length=0)
37plt.xticks(rotation='horizontal')
38
39ax.set_title("CHAT MESSAGES PER MONTH")
40
41fig.savefig('timestamps.png', dpi=600, transparent=True)
You can also annotate special points using plt.annotate
and plt.text
:
1plt.annotate(
2 '',
3 xy=(7.5, 250), arrowprops=dict(arrowstyle='simple', fc='black'), xytext=(6.5, 250), annotation_clip=False)
4
5plt.text(6.5, 269, 'FUTURE', fontsize=16)
Here is an example output:
The data was generated randomly.
More Resources about Telegram Reverse-engineering
You can find more information on the dflab blog about reversing the Telegram database. Thanks for sharing the knowledge! The following section was the key to this post:
if “mid” is negative, “uid” is negative or positive and has a length of about 19 characters[6] – the interlocutor’s data can be find in the “Enc_chats” table by the converted “uid” value. To find the appropriate contact in the “Enc_chats” table, you need to convert the decimal “uid” number to hexadecimal, then cut off the last 8 zeros from the received number and convert the eight-digit hexadecimal number back to decimal;