View Full Version : Mails with cp1251 encoding are not parsed
afterburner
01-20-2008, 10:03 PM
Hi guys!
I am using RC2 here. Just installed it, everything looks and works superb, with the exception of one big problem.
Most of the people that are going to send mails to the system I am setting up are going to write in bulgarian while using two main encodings - UTF8 and CP1251. When using utf8, everything is OK, the message bodys are parsed and readable in the staff panel, but if someone writes using CP1251 encoding, the message body stays empty, only the subject is parsed. I checked in the database and the problem was present there too - the "message" field in the "ost_ticket_message" table for this exact message was empty! Any ideas?
afterburner
01-21-2008, 04:47 PM
I just tested again with KOI8-R, another used cyrillic encoding, and the osTicket system does not parse the mail's body (the message) (the same as with CP1251/windows-1251). The corresponding field in the database is empty. Ideas where to look? I'm new to the system. :(
afterburner
01-21-2008, 11:18 PM
Adding...
//Check for charset and make corrections if it's Windows-1251 or KOI8-U. By afterburner
if (stristr($this->getHeader($mid), "windows-1251")) {
$body=iconv("windows-1251", "utf-8", $body);
} else if (stristr($this->getHeader($mid), "koi8-u")) {
$body=iconv("koi8-u", "utf-8", $body);
}
...to getBody() in class.pop3.php solved the problem. It would be nice to see something like that implemented by default to handle different charsets ;)
This looks interesting, I will see if it's suitable for the next release.
afterburner
01-22-2008, 09:31 PM
I've come up with a better, cleaner way to detect the encoding using the php's internal multibyte functions:
function getBody($mid) {
$body ='';
if(!($body = $this->fixCyrEncoding($this->getpart($mid,'TEXT/PLAIN'),$mid))) {
if(($body = $this->fixCyrEncoding($this->getPart($mid,'TEXT/HTML'),$mid))) {
//Convert tags of interest before we striptags
$body=str_replace("</DIV><DIV>", "\n", $body);
$body=str_replace(array("<br>", "<br />", "<BR>", "<BR />"), "\n", $body);
$body=Format::striptags($body); //Strip tags??
}
}
return $body;
}
function fixCyrEncoding ($body,$mid) { // By afterburner - fixing cyrillic encoded messages
$encodings=array("UTF-8", "WINDOWS-1251", "ISO-8859-5", "ISO-8859-1");
$body=iconv(mb_detect_encoding($body,$encodings), "utf-8", $body);
return $body;
}
This way it detects the encoding reagardless of headers, so it's more accurate.
P.S. Adding this line to createTicket() corrects mail fetching when there is no subject specified:
$var['subject']=$mailinfo['subject']?imap_utf8($mailinfo['subject']):"[No Subject]";
peter
01-22-2008, 09:53 PM
afterburner,
Thank you for the tips.
We can actually get the true encoding/charset in getPart function using $struct->parameters When calling getPart you simply pass the desired encoding type i.e getPart(messageID, mimeType, desiredCharset... );I believe this gives us a better logic and flexibility to even switch desired encoding on the fly.
Once again thank you.
afterburner
01-22-2008, 10:05 PM
Hi Peter,
I am maybe mistaking, but what happens if someone sends a mail using the PHP's mail() function, without adding any headers specifying the charset used. He sends the mail encoded with WINDOWS-1251 or ISO-8859-5? I think (and i'm fairly new to this, so correct me if I'm wrong) that there is no way that the osTicket will know how is the message encoded and it will not display it properly. This encoding, I believe, has nothing to do with the "Transfer encodings".
peter
01-22-2008, 10:17 PM
ab,
I am not objecting to auto detecting it. I was just pointing out that we shouldn't be auto detecting even when charset is available. I believe when charset is not explicitly set via the headers it defaults to server locale setting.
afterburner
01-22-2008, 10:25 PM
Guess you're right, the problem in my case is that we're receiving mails that are in 3-4 different encodings, while only the ones in UTF-8 are displayed, but I guess that's not something common, most of the users of osTicket are english speaking and useing UTF-8/ISO-8859-1.
Thanks for the answers ;)
afterburner
01-24-2008, 07:12 AM
I have one more question not concerning directly the discussion, does anyone know a way to use the IMAP extension to get the MIME Multipart headers, that are in the beginning of every part of the message, just after the boundary (Content-Type and charset), because the method I use includes parsing of the headers, but when the message is multipart, the charset declaration is not present in the header. I went through the manuals of the IMAP functions, but I didn't find any that can return this exact information. Any help? :)
peter
01-24-2008, 03:23 PM
ab,
You have to transverse the parts in message structure. This is what I was pointing out earlier on being able to get the charset of the individual parts. Below is an updated getPart function in class.pop3.php
function getPart($mid,$mimeType,$encoding=false,$struct=nul l,$partNumber=false){
if(!$struct)
$struct=imap_fetchstructure($this->mbox, $mid);
//Match the mime type.
if($struct && strcasecmp($mimeType,$this->getMimeType($struct))==0){
$partNumber=$partNumber?$partNumber:1;
if(($text=imap_fetchbody($this->mbox, $mid, $partNumber))){
if($struct->encoding==3 or $struct->encoding==4) //base64 and qp decode.
$text=$this->decode($struct->encoding,$text);
$charset=null;
if($encoding) { //Convert text to desired mime encoding...
if(!strcasecmp($struct->parameters[0]->attribute,'CHARSET') && strcasecmp($struct->parameters[0]->value,'US-ASCII'))
$charset=trim($struct->parameters[0]->value);
$text=$this->mime_encode($text,$charset,$encoding);
}
return $text;
}
}
//Do recursive search
if($struct && $struct->parts){
while(list($i, $substruct) = each($struct->parts)) {
if($partNumber)
$prefix = $partNumber . '.';
if(($text=$this->getPart($mid,$mimeType,$encoding,$substruct,$prefi x.($i+1))))
return $text;
}
}
//No luck.
return false;
}For example to get plain text use $body = $this->getpart($mid,'TEXT/PLAIN',$this->charset) where $this->charset is the desired enconding.
afterburner
01-24-2008, 07:06 PM
10x Peter, I'm actually using a version that's very near the one that you've done. The main difference is that I'm using the encoding hardcoded in the function as UTF-8. Thanks for all the help! Some rethinking after having some sleep, rereading the manual of the IMAP functions and your help guided me to the best and cleanest way to do what I wanted! I'd be glad to see these changes implemented in the next release ;)
afterburner
01-25-2008, 08:53 PM
Another "bug": in class.http.php, the function response uses WINDOWS-1252 as default encoding, which screws up premade responses written in Cyrillic for example ;) It should be UTF-8 as the rest of the site.
joacimb
07-16-2008, 12:04 PM
Sorry for bringing this up again. Your discussion was in December, and the problem still exists in osTicket 1.6 RC4.
Tried to apply the patches supplied by afterburner, but that didn't help. =/
I managed to get it working by applying these changes to class.pop3.php
(basiclly replacing the imap_utf8 with imap_mime_header_decode):
// Method for decoding an mime encoded string to utf-8.
function decodeMimeString($mimeString) {
$result='';
$mimeStringArray = imap_mime_header_decode($mimeString);
foreach ( $mimeStringArray as $mimeString ) {
$result .= mb_convert_encoding( $mimeString->text, 'utf-8', $mimeString->charset );
}
return $result;
}
function createTicket($mid,$emailid=0){
global $cfg;
$mailinfo=$this->getHeaderInfo($mid);
$var['name']=$this->decodeMimeString($mailinfo['from']['name']);
$var['email']=$mailinfo['from']['email'];
$var['subject']=$mailinfo['subject']?$this->decodeMimeString($mailinfo['subject']):'[No Subject]';
$var['message']=$this->decodeMimeString(Format::stripEmptyLines($this->getBody($mid)));
$var['header']=$cfg->saveEmailHeaders()?$this->getHeader($mid):'';
$var['emailId']=$emailid?$emailid:$cfg->getDefaultEmailId(); //ok to default?
$var['name']=$var['name']?$var['name']:$var['email']; //No name? use email
...
Am I out in the forest?
innerself
08-19-2008, 06:49 PM
I'm having trouble even with the changes mentioned above!
http://img169.imageshack.us/img169/7316/osticketwl5.jpg
harius
08-21-2008, 04:25 PM
Hi. My solution work perfect for me edit include/class.pop3.php:
109:
- $encodings=array(’UTF-8′,’WINDOWS-1251′, ‘ISO-8859-5′, ‘ISO-8859-1′);
+ $encodings=array(’UTF-8′,’WINDOWS-1251′, ‘ISO-8859-5′, ‘ISO-8859-1′, ‘KOI8-R’);
117:
- return imap_utf8($text);
+ return utf8_encode($text);
157:
- if($encoding && 0)
+ if($encoding)
That is all.
Для читающих на русском http://nobrend.ru/?p=48
innerself
08-22-2008, 02:11 PM
Hello friend,
I these changes and reinstalled the osTicket. But not solved the problem. My coding is from Brazil. What do you think could be happening?
http://img48.imageshack.us/img48/9528/imagemlw8.jpg
plameniv
12-10-2008, 11:31 AM
Hi. My solution work perfect for me edit include/class.pop3.php:
109:
- $encodings=array(’UTF-8′,’WINDOWS-1251′, ‘ISO-8859-5′, ‘ISO-8859-1′);
+ $encodings=array(’UTF-8′,’WINDOWS-1251′, ‘ISO-8859-5′, ‘ISO-8859-1′, ‘KOI8-R’);
117:
- return imap_utf8($text);
+ return utf8_encode($text);
157:
- if($encoding && 0)
+ if($encoding)
That is all.
Для читающих на русском http://nobrend.ru/?p=48
Hi,
Is there someone know how to setup cyrillic about create ticket from pipe ?