|
Maciej - 2022-01-14 00:29:54
Hi,
just trying to use the mime_parser_class to decode the Google Takeout Mbox backup, I am reading a single message from the file, as the backup could be a few GB, and then delivering this message to your class.
I am using $mime->Decode() and $mime->Analyze() (like in test_message_decoder.php), all works fine without errors, but all I can get is a whole message in $decoded[0]["Body"] or in $results["Data"]. No headers, no multipart decoded.
What am I doing wrong?
Is this class capable of decoding Google Gmail message format?
Bests,
Maciej.
Manuel Lemos - 2022-01-14 00:57:41 - In reply to message 1 from Maciej
Hello Maciej,
Would it be possible for you to share a sample script that I can use to reproduce the problem?
Maciej - 2022-01-14 01:41:46 - In reply to message 2 from Manuel Lemos
Hello Manuel,
Thanks for your fast response.
I am not sure, if the script is the problem, but sure - I can share it (here?) - the important part is only a few lines taken from your test.php:
$data=<full_gmail_message_as string>;
$mime=new mime_parser_class;
$mime->mbox = 1;
$mime->decode_bodies = 1;
$mime->ignore_syntax_errors = 1;
$mime->track_lines = 0;
$mime->use_part_file_names = 0;
$mime->extract_addresses = 1;
$parameters=array('Data'=>$data);
$deco=$mime->Decode($parameters, $decoded);
if ($deco) {
var_dump($decoded[0]);
$res=$mime->Analyze($decoded[0], $results);
var_dump($results);
}
The problem could be the $data, which is the Google Gmail message - starting like this:
From 1720730285230359859@xxx Sat Jan 01 05:51:18 +0000 2022
X-GM-THRID: 1720730285230359859
X-Gmail-Labels: =?UTF-8?Q?Skrzynka_odbiorcza,Wa=C5=BCne,Otwarte,Kategoria:_aktualizacje?=
Delivered-To: admin@mydomain.com
Received: by 2002:a4a:6d49:0:0:0:0:0 with SMTP id w9csp10425847oof;
Fri, 31 Dec 2021 21:51:18 -0800 (PST)
X-Received: by 2002:a2e:8899:: with SMTP id k25mr5669145lji.98.1641016278131;
Fri, 31 Dec 2021 21:51:18 -0800 (PST)
ARC-Seal: i=2; a=rsa-sha256; t=1641016278; cv=pass;
d=google.com; s=arc-20160816;
b=cQDFr7bFa1J5dPlKFGuOxchdehq1sbDj4r+f6R9r6oYm2Fvvkt9FyWd+n/pR2cnAER
yv1u44+QnAK8mrIWRDVE+NN+w3lrInS+ZOG/lkUi09bldSk5ThRQW/M20mpHjvI52ZQa
er75d2afdlmN8xFELnhlW6N3HEN7HuD2ubKycCiMvY9LbfItza7CUFeYp4ICHx9bxI6V
IEKS3QbnFFyuMaqZxeezFvaJ9ixX8MwT1my7rYip3lNlCbZA+G+LOCpr+rFvgZ4qMFXE
l8uAgjdf9njSXcC33my70/iBhO+eYP2kFe73UzQsOu+hDWTaP5K/nr31Kb8khunwt378
OGzQ==
(...)
(plus the rest of the message)
and your parser gives the result as follows:
---
$decoded[0] = Array
(
[Headers] => Array
(
[from ] => 1720730285230359859@xxx Sat Jan 01 05:51:18 +0000 2022
)
[Parts] => Array
(
)
[Position] => 0
[Body] => (...)
[BodyPart] => 1
[BodyLength] => 45816
)
$results = Array
(
[Type] => text
[Description] => Text message
[Data] => (...)
)
All the message body is in [Body] and in [Data]. Is that correct?
I would prefer to get some data decoded, but it is obvious, I guess. ;-)
Regards,
Maciej.
Manuel Lemos - 2022-01-14 03:16:51 - In reply to message 3 from Maciej
Good. Can you upload the data to a file for instance to your Google Drive account or some other Web based file storage service and share the link so I can use the file to reproduce the problem?
Maciej - 2022-01-14 14:21:01 - In reply to message 4 from Manuel Lemos
No problem, just have to strip some sensitive data from that backup, so it will take a while. It is just a standard Google Takeout (backup) of any Google/Gmail account (like Workspace) - anybody can get it.
But at this point, for me is important just a simple answer on the question:
Can your PHP MIME Email Message Parser Class read Google Takeout MBOX messages? (as of 2022)
[a] yes, it should without issues, tested many times
[b] yes, it should
[c] no, it won't read Google format
[d] no, it won't read Google format, but let's do it!
Cheers,
M.
Manuel Lemos - 2022-01-14 22:00:28 - In reply to message 5 from Maciej
The answer is b).
I followed the standard RFC documentation for the MIME messages. It should read that format.
Specifically, I did not test the Google Takeout format because I never needed it. I assume it is compliant with the RFC standards.
If it does not work for some reason, I can improve the package to make it work. I need a test data file to reproduce your situation and determine if any class code or configuration improvements are necessary.
Just let me know when you make the data file available. Usually, I have some time on weekends, as I work on business matters that generate revenue during the week, so I can justify the work to maintain classes like this and other projects for PHP developers.
Maciej - 2022-01-15 04:21:51 - In reply to message 6 from Manuel Lemos
Hi Manuel,
great, b), so let's do it! The example file is ready on my Google Drive, my e-mail is mxw3000@gmail.com - please let me what address should I share it with?
The Google Mbox Takeout format is useful, when you need to make backup of your Gmail messages. It can be imported to Thunderbird without problems, BitRecover Mbox viewer reads it as well (but slowly) - so I assume, it is a standard format indeed.
But today's Gmail format header has so many extra fields (unnecessary for just view the message), so it could be confusing (all this DMARK, DKIM, ARC, etc.).
Hope you will find time to take a look into this case.
All I need to extract (at this point) are proper emails from/to, subject, date, Gmail labels, and attachments list.
Bests,
M.
Manuel Lemos - 2022-01-16 13:55:52 - In reply to message 7 from Maciej
Yes, MBOX format is a set of MIME messages concatenated with special syntax to distinguish each message. So the parser class does almost the same.
Can you please go here to contact me so I can share my email privately?
phpclasses.org/professionals/contac ...
Maciej - 2022-01-16 15:41:16 - In reply to message 8 from Manuel Lemos
No problem, PM sent, thank you.
Bests,
M.
Manuel Lemos - 2022-01-16 23:14:47 - In reply to message 9 from Maciej
I got your message and replied to it. Just let me know here if you received it or not.
|