Robert Putt

< Home


Spying on the kids NSA style


Introduction

Recently my daughter has been growing increasingly aware of all the cool things she can do on the computer and various other devices around the home such as our tablets and as a parent this seemed horrifying as the internet is a scary place to let your little bundle of joy free. It's not that I don't trust my kid, it's more I don't trust the internet or those she may be browsing with. It's all too easy to stumble onto some age inappropriate content from a suitable website, or be led astray by school friends etc... Time for a geek dad's solution...

Requirements

  • Apply filtering for undesireable domains / urls (e.g. adult / violent / gambling content)
  • Apply filtering by keyword within content
  • Log which websites are visited
  • Limit when internet is accessible (e.g. not after bed time)
  • Able to be used on any network device e.g. Kindle / iPad as well as PCs

  • Solution

    Currently my network consists of a modem supplied by my ISP, a Debian host acting as a router with IPTables & DHCP, a switch, WiFi access point and clients. To facilitate the filtering and logging I installed Squid onto the router host to proxy internet traffic.

    apt-get install squid3
    Due to the heavy amount of commented lines in the config file I decided to remove any blank or commented lines.

    sed '/^$/d' /etc/squid/squid.conf > /etc/squid/squid.conf
    cat /etc/squid/squid.conf | grep -v '#' > /etc/squid/squid.conf
    Next I performed a basic configuration of Squid in my favourite editor to allow my subnets to use the proxy, as well as configuring caching both in memory and disk to help serve up repeat content quicker.

    acl localnet src 172.20.1.0/24
    acl SSL_ports port 443
    acl CONNECT method CONNECT
    http_access allow localhost manager
    http_access deny manager
    http_access allow localnet
    http_access deny all
    http_port 3128
    chunked_request_body_max_size 4096 KB
    maximum_object_size_in_memory 1024 KB
    maximum_object_size 256 MB
    cache_mem 256 MB
    cache_dir ufs /misc/cache 15000 16 256
    cache_store_log /var/log/squid/store.log
    coredump_dir /var/spool/squid
    refresh_pattern ^ftp:		1440	20%	10080
    refresh_pattern ^gopher:	1440	0%	1440
    refresh_pattern -i (/cgi-bin/|\?) 0	0%	0
    refresh_pattern .		0	20%	4320
    quick_abort_min -1 QB 
    read_ahead_gap 50 MB 
    positive_dns_ttl 30 second
    negative_dns_ttl 1 second 
    minimum_expiry_time 600 seconds
    always_direct allow all
    
    After restarting Squid I pointed my browser to the proxy server and browsed a selection of sites to confirm functionality. Once I could see my traffic in the logs I decided to move on to configuring some basic content filtering capabilities.



    Domain & URL Filtering

    The first line of filtering involved installing SquidGuard which restricts based on domains and urls as well as setting time constraints and other access controls. To get started I installed the relevant packages from Debian's repository and searched out a free to use filtering database, MESD seemed to be the most comprehensive.

    apt-get install squidgaurd
    wget http://squidguard.mesd.k12.or.us/blacklists.tgz
    tar -xvf blacklists.tgz
    cp -r blacklists/* /var/lib/squidguard/db/
    
    Then I updated the SquidGaurd config to match my requirements in the /etc/squidguard/squidGuard.conf file. This configuration should allow access only during the desired times and only to content which is not in any of the blocked categories.

    #
    # CONFIG FILE FOR SQUIDGUARD
    #
    
    dbhome /var/lib/squidguard/db
    logdir /var/log/squid
    
    #
    # TIME RULES:
    # abbrev for weekdays:
    # s = sun, m = mon, t =tue, w = wed, h = thu, f = fri, a = sat
    
    time allowed_hours {
            weekly mtwhf 15:00 - 20:00 		# Only allowed after school and before bed weekdays.
            weekly sa 08:00 - 20:00 		# Allowed all day at weekends.
    }
    
    #
    # DESTINATION CLASSES:
    #
    
    dest good {
    }
    
    dest local {
    }
    
    dest ads {
            domainlist      ads/domains
            urllist         ads/urls
    }
    
    dest aggressive {
            domainlist      aggressive/domains
            urllist         aggressive/urls
    }
    
    dest audio-video {
            domainlist      audio-video/domains
            urllist         audio-video/urls
    }
    dest drugs {
            domainlist      drugs/domains
            urllist         drugs/urls
    }
    
    dest gambling {
            domainlist      gambling/domains
            urllist         gambling/urls
    }
    
    
    dest hacking {
            domainlist      hacking/domains
            urllist         hacking/urls
    }
    
    dest mail {
            domainlist      mail/domains
    }
    
    dest porn {
            domainlist      porn/domains
            urllist         porn/urls
    }
    
    dest proxy {
            domainlist      proxy/domains
            urllist         proxy/urls
    }
    
    dest redirector {
            domainlist      redirector/domains
            urllist         redirector/urls
    }
    
    dest spyware {
            domainlist      spyware/domains
            urllist         spyware/urls
    }
    
    dest suspect {
            domainlist      suspect/domains
            urllist         suspect/urls
    }
    
    
    dest violence {
            domainlist      violence/domains
            urllist         violence/urls
    }
    
    dest warez{
            domainlist      warez/domains
            urllist         warez/urls
    }
    
    
    acl {
            # Allow access to internet except blocked categories during allowed hours, else do not allow any access, redirect bad content to the block page.
            default within allow_hours {
                    pass !ads !aggressive !audio-video !drugs !gambling !hacking !mail !porn !proxy !redirector !spyware !suspect !violence !warez all
                    redirect       302:http://172.20.1.254/ 
            } else {
            	pass none
            }
    }
    
    To function correctly and to increase performance SquidGaurd must compile the text based blocklists into a binary format, to do this the 'squidGuard -bdC all' command is issued, this may take several minutes. Ownership of the resulting db files must be updated to the squid user such as 'chown -R squid:squid /var/lib/squidgaurd/db'. Following this we can instruct Squid to use SquidGaurd to rewrite URLs for filtering purposes by adding the following line to the /etc/squid/squid.conf config file.

    url_rewrite_program /usr/bin/squidGuard
    
    Once complete Squid should be restarted and a URL in the blocklist should be tested, all being well the request should be redirected to the block page rather than the page being displayed. However as you continue reading you'll soon realise this is by no means fool proof and more needs to be done.

    service squid restart
    




    SSL Intercept

    The problem is if you try and access the same page via SSL connection the page still loads, obviously this is a super easy way to circumvent the filter especially considering most websites are encouraging SSL usage by default these days, to allow inspection, filtering and logging of content served over SSL Squid needs to be stop serving a tunnel to the SSL enabled content and start performing a Man In The Middle / SSL Intercept of the content so the filters may be applied. Of course, this raises eye brows as it instantly breaks the trust of SSL, and hence for this reason on hosts using the proxy I would not suggest looking at sensitive content following the next section of configuration.

    As part of performing the SSL intercept your client will request an SSL URL such as https://www.google.com, the proxy server will then fetch the site on your behalf and terminate the SSL connection decrypting it into plain text, it will then perform the relevant filtering and once satisfied issue a new certificate for the site using a local certificate authority and pass on the locally encrypted content to the client. If the client has the local certificate authorities root certificate in it's trusted list it will display the page without error or warning, of course some modern SSL features such as extended validation certificates will not work with such a configuration.



    To begin performing man in the middle via Squid SSL-bump needs to be configured, before updating the squid.conf the root certificate and private key for the local certificate authority needs to be generated in a suitable location.

    cd /var/lib/squid
    mkdir certs
    cd certs
    openssl req -new -newkey rsa:2048 -sha256 -days 365 -nodes -x509 -keyout squid_ca.pem  -out squid_ca.pem -extensions v3_ca
    chown -R squid:squid *
    
    Now the /etc/squid/squid.conf file can be updated to enable SSL-bump.

    sed -i 's/http_port 3128/http_port 3128 ssl-bump cert=/var/lib/squid/certs/squid_ca.pem key=/var/lib/squid/certs/squid_ca.pem generate-host-certificates=on dynamic_cert_mem_cache_size=4MB/g'
    echo "ssl_bump stare all" >> /etc/squid/squid.conf
    echo "ssl_bump bump all" >> /etc/squid/squid.conf
    echo "ssl_bump server-first all" >> /etc/squid/squid.conf
    service squid restart
    
    Following the Squid restart the proxy is tested again, it can be noted that although non SSL traffic still works fine any SSL traffic now results in a certificate error as the local certificate authority is not recognised by the browser.



    This can be fixed by copying the root certificate from the local certificate authority to each client's keychain and marking it as trusted, if using Active Directory to manage users and hosts in a corporate environment the certificate can be pushed out via group policy, however in my case I manually copied the certificate to each client machines. Once trust for the CA has been established on the client we can try loading the page again and the certificate error is now gone.



    And in the logs it is now visible every request the client makes including URLs of SSL requests rather than just a log line indicating a tunnelled connection.



    Keyword Filtering

    The next requirement on my feature list was the ability to block web pages based on keywords being present within their content. This is important because much more content gets published than can possibly be feasibly added to a block list, and whilst the block list covers known common sites with bad content it would be useful to block content on the fly which may not be appropriate, of course by doing this I accept that false positives are likely to happen and may even be a regular occurrence.

    For this task I chose to use E2guardian which provides an Internet Content Adaptation Protocol (ICAP) hook for Squid to send web site content check it against a rule list and returns a verdict of whether to serve the content or not. The default configuration for E2Gaurdian provides some, probably over the top filtering, by default however it is highly customisable based on config files...

    wget https://github.com/e2guardian/e2guardian/releases/download/v3.5.0/e2guardian_3.5.0_jessie_amd64.deb
    dpkg -i e2guardian_3.5.0_jessie_amd64.deb
    apt-get -f install
    

    Next the ICAP hook to E2Guardian is configured in the Squid config by adding the following lines to /etc/squid/squid.conf

    icap_enable on
    icap_service service_req reqmod_precache bypass=0 icap://127.0.0.1:1344/request
    icap_service service_resp respmod_precache bypass=0 icap://127.0.0.1:1344/response
    adaptation_access service_req allow all
    adaptation_access service_resp allow all
    icap_send_client_ip on
    adaptation_masterx_shared_names X-ICAP-E2G
    

    After restarting Squid via the service squid restart command we can check if E2Guardian is blocking web content based on keywords... Lets try a Google search for some pornography... All being well we should be presented with the E2Gaurdian block page.



    As you can see the site is blocked, but it also shows the regular expression which was used for blocking said traffic, perfect for debug but essentially showing the user a dictionary of words which are blocked, maybe for your use case this is not so great, of course by editing the configuration you can use custom block pages with E2Guardian.



    Viewing Logs

    Last but not least I wanted to have some sort of log of what the kid has been up to on the internet. Of course, Squid is already logging everything into it's access log but thats not super friendly to look at, so here are a few tools which can help...

    SquidView
    SquidView allows live viewing of which sites are currently flowing through the proxy, it is terminal based and have a "top" feeling to it. It's use case is probably limited but handy if you want to spy in real time on what is going on.

    Pros: 0 configuration
    Cons: Limited use case



    SARG - Squid Analysis Report Generator
    SARG allows you to analyse the Squid access log on a periodic cron job and output awesome HTML reports which allow you to drill down into log data by date / time, client IP, and domain. Personally I feel this is more useful than SquidView as it allows you to view data after the fact without having to grep the logs on in the terminal but the search capability is very limited (ctrl+f in broweser).

    Pros: Easy configuration, Easy to navigate HTML reports
    Cons: No search functionality



    Logstash + ElasticSearch + Kibana
    Using the ELK stack would provide the most flexibility when it comes to shipping, storing and analysing log data however it's configuration is a lot more complex than the previous tools. Rather than go into depth about how to do it here I'd recommend Thomas Decaux's article he covers configuring Squid & ELK stack.

    Pros: Fantastic search, Scalable, Very tasty customisable dashboards
    Cons: More complex to get up and running / maintain



    Conclusion

    Overall I managed to reach what I set out to achieve, safe, logged internet access for the kid, however along the way I did feel a bit like this wasn't just a simple filtering system, but more of a complete spy setup, especially due to the use of SSL bump and the access logging. For sure I wouldn't trust any host subject to the proxy due to the MITM on the SSL being a potential point of exploit. I guess now I'll distribute my rogue CA to devices my kid will access and update their proxy configurations and see what kind of data I collect. Watch this space in the future for some analysis of the data (and maybe some automated alerting e.g. when there is an increasing trend of blocked content for instance).