August 6, 2011

Caching Youtube Using Squid Caching Proxy


I'm doing quick post today as I'm quite busy right now. But I don't want to make my friend Piju disappointed as he request me (quite so long... sorry heheh) about my changes on his previous squid.conf to cache youtube which is not working anymore since youtube make some changes on their video URL.

Here is my new /etc/squid/squid.conf
acl all src all
acl manager proto cache_object
acl localhost src 127.0.0.1/32
acl to_localhost dst 127.0.0.0/8 0.0.0.0/32
acl localnet src 10.0.0.0/8 # RFC1918 possible internal network
acl localnet src 172.16.0.0/12  # RFC1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
acl SSL_ports port 443
acl Safe_ports port 80      # http
acl Safe_ports port 21      # ftp
acl Safe_ports port 443     # https
acl Safe_ports port 70      # gopher
acl Safe_ports port 210     # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280     # http-mgmt
acl Safe_ports port 488     # gss-http
acl Safe_ports port 591     # filemaker
acl Safe_ports port 777     # multiling http
acl CONNECT method CONNECT
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localnet
http_access deny all
icp_access allow localnet
icp_access deny all
http_port 31288
hierarchy_stoplist cgi-bin ?
cache_mem 2048 MB
maximum_object_size_in_memory 1024 KB
cache_dir ufs /disk2-cache/var/cache 150000 16 256
cache_dir ufs /disk1-1/squid-cache 150000 16 256
maximum_object_size 128 MB
access_log /disk2-cache/var/logs/access.log squid
cache_log /disk2-cache/var/logs/cache.log
cache_store_log /disk2-cache/var/logs/store.log
pid_filename /disk2-cache/var/logs/squid.pid
netdb_filename /disk2-cache/var/logs/netdb.state
storeurl_rewrite_children 50
refresh_pattern -i \.flv$          1440   80%    10080 ignore-no-cache override-expire ignore-private
refresh_pattern ^ftp:       1440    20% 10080 ignore-no-cache override-expire ignore-private
refresh_pattern ^http://[A-Za-z0-9]+\.lscache[0-9]\.c\.youtube\.com    9999999 90% 999999999 ignore-no-cache override-expire ignore-private
refresh_pattern ^http://[a-z0-9]+\.youtube\.com                        9999999 90% 999999999 ignore-no-cache override-expire ignore-private
refresh_pattern ^http://[a-z]+\.youtube\.com                           9999999 90% 999999999 ignore-no-cache override-expire ignore-private
refresh_pattern ^http://[a-z0-9]+\.ytimg\.com                          9999999 90% 999999999 ignore-no-cache override-expire ignore-private
refresh_pattern ^http://*\.youtube\.com     9999999  90%  999999999 ignore-no-cache override-expire ignore-private
refresh_pattern get_video\?video_id         9999999  90%  999999999 ignore-no-cache override-expire ignore-private
refresh_pattern youtube\.com/get_video\?    9999999  90%  999999999 ignore-no-cache override-expire ignore-private
refresh_pattern ^http://*.youtube.com/.*    9999999  100% 999999999 ignore-no-cache override-expire ignore-private
refresh_pattern (get_video\?|videoplayback\?|videodownload\?) 10080 99.99999% 999999 override-expire ignore-reload ignore-private negative-ttl=0
refresh_pattern ^gopher:    1440    0%  1440
refresh_pattern -i (/cgi-bin/|\?) 0 0%  0
refresh_pattern .               0       40%     4320
acl store_rewrite_list url_regex -i \.youtube\.com\/get_video\?
acl store_rewrite_list url_regex -i \.youtube\.com\/videoplayback\.youtube\.com\/videoplay \.youtube\.com\/get_video\?
acl store_rewrite_list url_regex -i \.youtube\.[a-z][a-z]\/videoplayback\.youtube\.[a-z][a-z]\/videoplay \.youtube\.[a-z][a-z]\/get_video\?
acl store_rewrite_list url_regex -i \.googlevideo\.com\/videoplayback\.googlevideo\.com\/videoplay \.googlevideo\.com\/get_video\?
acl store_rewrite_list url_regex -i \.google\.com\/videoplayback\.google\.com\/videoplay \.google\.com\/get_video\?
acl store_rewrite_list url_regex -i \.google\.[a-z][a-z]\/videoplayback\.google\.[a-z][a-z]\/videoplay \.google\.[a-z][a-z]\/get_video\?
acl store_rewrite_list url_regex -i (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\/videoplayback\?
acl store_rewrite_list url_regex -i (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\/videoplay\?
acl store_rewrite_list url_regex -i (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\/get_video\?
acl store_rewrite_list url_regex -i http://video\..*fbcdn\.net.*\.mp4.*
acl store_rewrite_list url_regex -i http://.[0-9]\.[0-9][0-9]\.channel\.facebook\.com/.*
acl store_rewrite_list url_regex -i http://.*\.mp4?
acl store_rewrite_list url_regex -i http://www\.facebook\.com/ajax/flash/.*
acl store_rewrite_list url_regex -i http://.*\.ak\.fbcdn\.net/.*
acl store_rewrite_list url_regex -i \.geo.yahoo\.com\?
storeurl_access allow store_rewrite_list
storeurl_access deny all
storeurl_rewrite_program /etc/squid/youtube
quick_abort_min 500 KB
acl shoutcast rep_header X-HTTP09-First-Line ^ICY.[0-9]
upgrade_http0.9 deny shoutcast
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
cache_mgr apogee@apogeek.com
cache_effective_user squid
cache_effective_group squid
snmp_port 3401
acl aclname snmp_community string
acl snmppublic snmp_community public
snmp_access allow snmppublic all
snmp_outgoing_address 0.0.0.0
dns_nameservers 8.8.8.8
dns_nameservers 8.8.4.4
dns_nameservers 4.2.2.2
coredump_dir /disk2-cache/var/cache

And here is the /etc/squid/youtube Perl script
#!/usr/bin/perl
$|=1;
while (<>) {
@X = split;
$url = $X[0];
if ($url=~s@^http://(.*?)/videoplayback\?(.*)id=(.*?)&.*@squid://videos.youtube.INTERNAL/ID=$3@){}
elsif
    ($url=~s@^http://(.*?)/videoplayback\?(.*)id=(.*?)@squid://videos.youtube.INTERNAL/ID=$3@){}
elsif
    ($url=~s@^http://(.*?)/videoplay\?(.*)id=(.*?)&.*@squid://videos.youtube.INTERNAL/ID=$3@){}
elsif
    ($url=~s@^http://(.*?)/videoplay\?(.*)id=(.*?)@squid://videos.youtube.INTERNAL/ID=$3@){}
elsif
    ($url=~s@^http://(.*?)/get_video\?(.*)video_id=(.*?)&.*@squid://videos.youtube.INTERNAL/ID=$3@){}
elsif
    ($url=~s@^http://(.*?)/get_video\?(.*)video_id=(.*?)@squid://videos.youtube.INTERNAL/ID=$3@){}
elsif
    ($url=~s@^http://(.*?)rapidshare(.*?)/files/(.*?)/(.*?)/(.*?)@squid://files.rapidshare.INTERNAL/$5@){}
elsif
    ($url=~s@^http://(.*?)fbcdn\.net/(.*?)/(.*?)/(.*?\.jpg)@squid://files.facebook.INTERNAL/$4@){}
elsif
    ($url=~s@^http://contenidos2(.*?)/(.*?)@squid://files.contenidos2.INTERNAL/$2@){}
elsif
    ($url=~s@^http://cdn(.*?)/([0-9a-zA-Z_-]*?\.flv)@squid://files.cdn.INTERNAL/$2@){}
elsif
    ($url=~s@^http://web.vxv.com/data/media/(.*?)@squid://files.vxv.INTERNAL/$1@){}
elsif
    ($url=~s@^http://(.*?)megaupload\.com/files/(.*?)/(.*?)@squid://files.megaupload.INTERNAL/$3@){}
elsif
    ($url=~s@^http://(.*?)mediafire\.com/(.*?)/(.*?)@squid://files.megaupload.INTERNAL/$3@){}
elsif
    ($url=~s@^http://(.*?)depositfiles\.com/(.*?)/(.*?)/(.*?)@squid://files.megaupload.INTERNAL/$4@){}
elsif
    ($url=~s@^http://(.*?)\.files\.youporn\.com\/(.*?)\/([0-9a-zA-Z_-]*?\.flv)\?.*@squid://videos.youporn.INTERNAL/$3@){}
elsif
($url=~s@^http://(.*?)\.tube8\.com\/(.*?)\/([0-9a-zA-Z_-]*?\.flv)\?.*@squid://videos.tube8.INTERNAL/$3@){}
elsif
    ($url=~s@^http://(.*?)\.tube8\.com\/(.*?)\/([0-9a-zA-Z_-]*?\.flv)@squid://videos.tube8.INTERNAL/$3@){}
elsif
    ($url=~s@^http://(.*?)megaporn\.com\/files\/(.*?)\/(.*?)@squid://files.megaporn.INTERNAL/$3@){}

print "$url\n"; }


Since this happen for quite some time and I'm busy coding something else, I don't remember which part were changed. Maybe somewhere around url_regex. Perhaps if piju's blog are still running (which is currently not), we can do a diff to compare them. Anyway, here it is. These scripts are shared and distributed as is. If you change it, feel free to let me know. Otherwise, just enjoy!

Share This Article:


Bookmark This Article:
Feed Me Digg Technorati del.icio.us Best to Stumbleupon Reddit Blinklist Furl Spurl Yahoo Simpy

16 comments:

ak47suk1 said...

nice. Going to try it later.

mypapit said...

great tip on Youtube squid caching dude, really helps me...

psyionx said...

So.... does this Work? my internet in my company is full with youtube users.... i'm going to tell my IT admins if this works... else his going to get his head spin how to fix his slow moving network..

psyionx said...

Hope this work.... If this works this going to save the world.

linopod said...

Boss... Macam mana mau pakai ni?

the squid.conf and youtube dono how to use lah.

here's what i've done. i install a new ubuntu 10.10 and the user name is squid. computer name also squid. in a box with 160 Gig HDD.

sudo apt-get install squid will give me stable 2.7

Sudo nautilus to make the permission to for folders and to the sub folders of disk2-cache and disk1-1.
made the logs pid and another one using empty file.
no more fatal error but squid dont run.

try sudo squid -z there are folders got populated both of the folders in disk1 and 2...
but sudo squid -k check... it say
squid: ERROR: No running copy

did i miss something???


been burning the mid night oil till the sun came out... need some help and guides...

ApOgEE said...

Aiseh bro, kasi run la dulu itu squid.

$ sudo service squid start

linopod said...

Astaga!!!

Thank YOU bro... !!!



squid@squid:~$ sudo service squid start
squid start/running, process 2173
squid@squid:~$ sudo squid -z
2011/08/11 19:32:18| Squid is already running! Process ID 2175
squid@squid:~$ netstat -lnp | grep 31288
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:31288 0.0.0.0:* LISTEN -
squid@squid:~$

Gizoogle said...

Any luck anyone? ive been trying for so long D: if this is possible please let me know =D

ApOgEE said...

at least it is still working on my server. ;)

Mr. Por said...

sorry i have no experience with squid. i am no luck
seem pattern not same (guess), please advise.
---
http://o-o.preferred.true-bkk1.v22.lscache5.c.youtube.com
http://v10.nonxt3.c.youtube.com
---
MISS from CACHE_SRV:3128
squid/2.7.STABLE9

Mr. Por said...

i have no luck maybe URL pattern not match
squid not HIT cache, please advise

thank you.
Por
---------------- URL Pattern --------------
http://o-o.preferred.true-bkk1.v22.lscache5.c.youtube.com/videoplayback?sparams=id&expire&ip&ipbits&itag&algorithm&burst&factor&fexp=903922&907301&algorithm=throttle-factor&itag=34&ip=119.0.0.0&burst=40&sver=3&signature=64426EBE1CDB24570FB7EDEE34A986305A61016F.18044CC9273F15046E3C7CB4E99225BFBC48654B&expire=1314273600&key=yt1&ipbits=8&factor=1.25&id=298a7f5e2e00b404

Dana said...

Hi, thanks it works for me,
Can you add some code for dailymotion video cache??

Thank You!

nido said...

Dana!

did you follow the same as he guided in the article?? i did not try this, but i wana to cache you tube videos ...

CyPH3r said...

is this script working now a days?

Eliezer Croitoru said...

you can use another method that can be used on squid 3.X also and not only old versions.
http://code.google.com/p/youtube-cache/

also i was working on a prototype using icap and not re-director or url store rewrite.

docbill said...

Perhaps it is because I'm using a different version of squid, but this did not work until I modified the perl script. $X[0] is always a value of 0, $X[1] is the actual url.

I merged your configuration and script with one from another site that also did not work and finally got something that appears to work.

Only I'm still not quite satisfied, as I am trying to get something that works with a proxy.pac file, so I can limit what I redirect to squid. Thus far this only works if I redirect everything.