Cal’s morning talk, continued

I zoned out over the development process part, to work on some Upcoming stuff. Sorry about that.

Now we’re on Unicode 101. Short discussion of charset vs. encoding, a few examples of ASCII, UCS2/UTF-16, and UTF-8. “charset” in HTTP headers and meta’s is a misnomer, should be “encoding”. Rationale discussion using substr… my preferred example is in truncation of db strings. Truncating a string that’s in a different language will completely bust a non-unicode implementation. Some of the probs with MySQL are on older (3.0 and 4.0 versions), so those aren’t huge issues for us. Javascript mostly handles UTF-8 successfully, except for the escape() function, so we have to implement our own UTF-8 escape() function.

UTF-8 and email. Content type headers only apply to content blocks. Headers need inline encoding, ex: “=?utf-8?Q?…” Defined in RFC 1342. “Q” is similar to quoted-printable, B is base 664. This allows non-ASCII in the subject line.

Any junk that you receive, you assume is Latin-1 that is misidentified.

Filtering done only at the outside, possibly except for “signed data.” I don’t really understand his example here, so i’ll ask later. You never want chars below 0×20. Apart from normalized carriage returns. Carriage returns mess up XML attributes, though.

He has a Filtering PCRE sample, but there’s a mistake, and he recommends using iconv to convert from UTF-8 to UTF-8.

Discusses HTML and javascript filtering. Some common XSS hacks. Promotes lib_filter.

Dealing with email – receiving email is useful, very handy to support mobile blogging, support tracking. Discusses uses of pipes from /etc/aliases.

Mime in a nutshell – defines some content types, multipart mail can contain sub parts. For mail with attachments, main part is in text, rest is in binary.

Mail::mimeDecode – what Flickr mail parser is based on. Not too broken for their use, heh. application/ms-tnef – that’s MS’s winmail.dat. Only used by Outlook (transport neutral encapsulation format). A packed list of files and metadata. The spec was buried on the MS site, and is fairly easy to unpack if you know the spec. Some code somewhere may document or handle it.

Incoming email isn’t necessarily latin-1 or utf-8. Forcing a character set is kind of lame. Can find out the intended charset from email’s content-type header. Spec states they must state the content-type unless they’re latin1. Fortunately, iconv does the heavy lifting.

Wireless messages suck, because they do special casing but then also append crap at the end often. Attachments as images but text/plain mime type. Wireless carriers also append additional images which are their logo and extra spacer gifs. The worst offenders send links to images instead of actual images. Sometimes slashes are doubled up, so they break automated grabs. Try to capture weird emails and add them to a test suite, along with expected results.

The desired system is a closed test system with easily repeatable regression tests.

That completes the pre-lunch session, so we’re off to lunch.

Cal’s morning talk, continued

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List