Your own Facebook API - logging in

Facebook API is a powerfull tool which provides you an interface to create games, authorize for an application or utility application, like wall content analyser, etc. But there are couple things you cannot use API to. For example, you can easily send message to your friends, but there is no way to receive messages from them. For such and other reasons I decided to web scrape FB in order to create own API. In this case I used low-end web interface available at http://mbasic.facebook.com. In this post I will describe how to create simple logging in python.

We will need to libraries for our purpose: requests for making calls to FB and HTMLParser to scrape HTML code and extract useful information. We will also need to know how Facebook’s low-end interface work. In this case we will go to http://mbasic.facebook.com (make sure to logout first) and use web inspector tool to preview HTML code. We should see something like below

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
<form id="login_form" action="https://mbasic.facebook.com/login.php?refsrc=https%3A%2F%2Fmbasic.facebook.com%2F&amp;lwv=100&amp;refid=8">
  <input autocomplete="off" name="lsd" type="hidden" value="AVrCfvCI" /> 
  <input name="charset_test" type="hidden" value="€,´,€,´,水,Д,Є" /> 
  <input name="version" type="hidden" value="1" /> 
  <input id="ajax" name="ajax" type="hidden" value="0" /> 
  <input id="width" name="width" type="hidden" value="0" /> 
  <input id="pxr" name="pxr" type="hidden" value="0" /> 
  <input id="gps" name="gps" type="hidden" value="0" /> 
  <input id="dimensions" name="dimensions" type="hidden" value="0" /> 
  <input name="m_ts" type="hidden" value="1466018891" /> 
  <input name="li" type="hidden" value="S6xhV6uh5PgJqdeqoQg1mGd-" /> 
  <input class="bi bj bk" name="email" type="text" value="" /> 
  <input class="bi bj bl bm" name="pass" type="password" /> 
  <input class="m n bn bo bp" name="login" type="submit" value="Log In" />
</form>

As you can see, what we need is to find a form tag with id “login_form” and extract every input field with it’s name and value attribute. To do that we will need to create parser class which derives from HTMLParser. We should end up with someting like that:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class LoginParser(HTMLParser):
    isLoginForm = False
    data = {}
 
    def handle_starttag(self, tag, attrs):
        if tag == "form":
            attrs = {k[0]: k[1] for k in attrs}
            if attrs['id'] == "login_form":
                self.isLoginForm = True
                self.action = attrs['action']
        else:
            if self.isLoginForm:
                if tag == "input":
                    name = ""
                    value = ""
                    for key, val in attrs:
                        if key == "name":
                            name = val
                        if key == "value":
                            value = val
 
                    self.data[name] = value
 
    def handle_endtag(self, tag):
        if tag == "form" and self.isLoginForm:
            self.isLoginForm = False

Now we need to connect to Facebook and retrieve login form and send login request. We can do that with following code with usage of requests library

1
2
3
4
5
6
7
	response = requests.get(self.loginUrl, headers = self.headers})
        parser = LoginParser()
        parser.feed(response.text)
        form = parser.data
        form['email'] = login
        form['pass'] = password
        response = requests.post(parser.action, form, cookies = response.cookies, headers = self.headers})

Here we send initial request to get form, then we feed parser with response and extract login form data. After that we set login and password in data dictionary and send post request to Facebook with login and form data, cookies retrieved at the beginning and headers which should simulate desktop browser. Setting headers is not mandatory, but we do not want to let Facebook know, that we are using our own script to do the job. Whole application should look like following

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import requests
from HTMLParser import HTMLParser
 
class Facebook:
    loginUrl = "https://mbasic.facebook.com/"
    headers = {"User-Agent": "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0",
               "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
               "Accept - Language": "en-US,en;q=0.5",
               "Accept - Encoding": "gzip, deflate, br",
               "Referer": "https://mbasic.facebook.com/"
    }
 
    def login(self, login, password):
        response = requests.get(self.loginUrl, headers = self.headers})
        parser = LoginParser()
        parser.feed(response.text)
        form = parser.data
        form['email'] = login
        form['pass'] = password
        response = requests.post(parser.action, form, cookies = response.cookies, headers = self.headers})
        print response.text
 
class LoginParser(HTMLParser):
    isLoginForm = False
    data = {}
 
    def handle_starttag(self, tag, attrs):
        if tag == "form":
            attrs = {k[0]: k[1] for k in attrs}
            if attrs['id'] == "login_form":
                self.isLoginForm = True
                self.action = attrs['action']
        else:
            if self.isLoginForm:
                if tag == "input":
                    name = ""
                    value = ""
                    for key, val in attrs:
                        if key == "name":
                            name = val
                        if key == "value":
                            value = val
 
                    self.data[name] = value
 
    def handle_endtag(self, tag):
        if tag == "form" and self.isLoginForm:
            self.isLoginForm = False
 
 
fb = Facebook()
fb.login("[email protected]", "secretpassword")