Update to Manual and new case study from BFI #432

digitensions · 2024-03-07T12:46:01Z

Hi there,

Submitting some developments for your review, as possible additions/developments of the RAWcooked website.

Thanks,
Joanna

First entry

Adding anchor points

1 year study data

Image sequence assessment

Add horizontal lines

Add demux section

Update mux to encode

Spaces and conclusion

digitensions

Added those amendments Jérôme, thanks!

JeromeMartinez · 2024-03-07T15:42:45Z

Doc/Case_study.md

+- 40Gbps Network card  
+- NAS storage with 40Gbps network card  
+
+The more CPU threads you have the better your FFmpeg encode to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our configuration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpu```.


@digitensions is it possible to have a rough estimate of the encoding speed with one content processed e.g. RGB 10-bit 2K. I am curious to see how the CPU behaves with one single encoding.

@digitensions also, if possible, to have an average speed per "parallel" jobs and the count of "parallel" jobs you have, for similar content e.g. all RGB 10-bit 2K, so we can see the difference between a single job and "parallel" usage.

digitensions · 2024-03-07T21:06:22Z

Yes I’ll try and get one encoded in coming weeks. We have a lot of encodings running at the moment to clear storage backlogs.On 7 Mar 2024, at 15:43, Jérôme Martinez ***@***.***> wrote: @JeromeMartinez commented on this pull request. In Doc/Case_study.md:

+* [Additional resources](#links)

+ +--- +### <a name="server_config">Server configurations</a> + +To encode our DPX sequences we have a single server that completes this work against 6 different Network Attached Storage (NAS) devices in parallel. + +Our current server configuration: +- Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz +- 252GB RAM +- 32-core with 64 CPU threads +- Ubuntu 20.04 LTS +- 40Gbps Network card +- NAS storage with 40Gbps network card + +The more CPU threads you have the better your FFmpeg encode to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our configuration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpu```. @digitensions is it possible to have a rough estimate of the encoding speed with one content processed e.g. RGB 10-bit 2K. I am curious to see how the CPU behaves with one single encoding. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

Add 2k bits

Add sublicense data

Typing error fixes

Add MKV durations

Clean up month statement

retokromer · 2024-04-09T10:44:53Z

Doc/Case_study.md

+
+Next the first file within the image sequence is checked against a [Media Area's MediaConch software](https://mediaarea.net/MediaConch) policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion or possible anomalies in the DPX.
+
+The frame pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the RAWcooked encode based on previous reduction experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequences to make a 1TB FFV1 Matroska.  For 2K Luma and all 4K we must assume that very small size reductions could occur so map 1TB to 1TB. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB. Where a sequence is over 1TB we have Python scripts to split that DPX sequence across additional folders depending on total size.


atleast > at least?

retokromer · 2024-04-09T10:45:32Z

Doc/Case_study.md

+
+### <a name="muxing">Encoding the image sequence</a>  
+
+To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file and embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be verified as bit-identical to the original source sequence.


imporantly > importantly

retokromer · 2024-04-09T10:48:30Z

Doc/Case_study.md

+
+### <a name="ffv1_demux">FFV1 Matroska decode to image sequence</a>
+
+We have automation scripts that return an FFV1 Matroska back to the original image sequence. These are essential for our film preseration colleagues who may need to perform grading or enhancement work on preserved films. For this we use the ```--all``` command again which can select decode when an FFV1 Matroska is supplied.  


preseration ?

retokromer · 2024-04-09T10:49:36Z

Doc/Case_study.md

+---
+## Conclusion
+
+We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film.  We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview.  There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to our Digital Preservation Infrastructure.  Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding a specific DPX scan - there can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.  


seqeuence > sequence

retokromer

a few typos

Update encoding timings with set parallel encoding number

Rewrite last paragraph, for review

Clean up spelling errors

Added network contention info to the introduction section

digitensions added 30 commits February 19, 2024 10:01

Create Case_study.md

06df171

First entry

Update Case_study.md

91d4251

Update Case_study.md

57189d4

Update Case_study.md

ed30cb7

Update Case_study.md

7b67a82

Update Case_study.md

70f60ca

Update Case_study.md

c1009bc

Update Case_study.md

d560eec

Update Case_study.md

382061a

Update Case_study.md

7a1518f

Update Case_study.md

6af0e10

Update Case_study.md

ab6094e

Update Case_study.md

10cdb86

Update Case_study.md

e5086e2

Adding anchor points

Update Case_study.md

ce94d40

1 year study data

Update Case_study.md

0709d92

Update Case_study.md

91eb2ef

Image sequence assessment

Update Case_study.md

23b656e

Update Case_study.md

aef832f

Update Case_study.md

f374a74

Update Case_study.md

191c5b8

Add horizontal lines

Update Case_study.md

0d1450b

Update Case_study.md

07df454

Update Case_study.md

084925b

Add demux section

Update Case_study.md

af21eba

Update Case_study.md

0afbf39

Update Case_study.md

32d1077

Update Case_study.md

b31ccda

Update Case_study.md

615f78d

Update mux to encode

Update Case_study.md

a785a29

Spaces and conclusion

digitensions added 7 commits March 7, 2024 12:38

Update User_Manual.md

1842df5

Update User_Manual.md

3b9bbfc

Update User_Manual.md

cd1ebb4

Update User_Manual.md

3a8d404

Update User_Manual.md

656f4c3

Update User_Manual.md

4dd0d3b

Update User_Manual.md

bb0aa27

digitensions commented Mar 7, 2024

View reviewed changes

JeromeMartinez reviewed Mar 7, 2024

View reviewed changes

digitensions added 7 commits April 9, 2024 10:53

Update Case_study.md

9e9ca24

Add 2k bits

Update Case_study.md

080ce97

Update User_Manual.md

5e08699

Add sublicense data

Update User_Manual.md

4b8deee

Update User_Manual.md

1bd00b0

Typing error fixes

Update Case_study.md

69522a5

Add MKV durations

Update User_Manual.md

69ac58b

Clean up month statement

retokromer reviewed Apr 9, 2024

View reviewed changes

digitensions and others added 8 commits April 12, 2024 15:04

Update Case_study.md

913f81a

Update Case_study.md

4fd2998

Update Case_study.md

f286b08

Update encoding timings with set parallel encoding number

Update Case_study.md

89e35cf

Update Case_study.md

63e3ab4

Rewrite last paragraph, for review

Merge branch 'MediaArea:main' into master

46d7b32

Update Case_study.md

291e0e3

Clean up spelling errors

Update Case_study.md

224a7dd

Added network contention info to the introduction section

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to Manual and new case study from BFI #432

Update to Manual and new case study from BFI #432

digitensions commented Mar 7, 2024

digitensions left a comment

JeromeMartinez Mar 7, 2024

JeromeMartinez Mar 7, 2024

digitensions commented Mar 7, 2024 via email

retokromer Apr 9, 2024 •

edited

Loading

retokromer Apr 9, 2024

retokromer Apr 9, 2024

retokromer Apr 9, 2024

retokromer left a comment


		Next the first file within the image sequence is checked against a [Media Area's MediaConch software](https://mediaarea.net/MediaConch) policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion or possible anomalies in the DPX.

		The frame pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the RAWcooked encode based on previous reduction experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequences to make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions could occur so map 1TB to 1TB. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB. Where a sequence is over 1TB we have Python scripts to split that DPX sequence across additional folders depending on total size.


		### <a name="muxing">Encoding the image sequence</a>

		To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file and embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be verified as bit-identical to the original source sequence.


		### <a name="ffv1_demux">FFV1 Matroska decode to image sequence</a>

		We have automation scripts that return an FFV1 Matroska back to the original image sequence. These are essential for our film preseration colleagues who may need to perform grading or enhancement work on preserved films. For this we use the ```--all``` command again which can select decode when an FFV1 Matroska is supplied.

Update to Manual and new case study from BFI #432

Are you sure you want to change the base?

Update to Manual and new case study from BFI #432

Conversation

digitensions commented Mar 7, 2024

digitensions left a comment

Choose a reason for hiding this comment

JeromeMartinez Mar 7, 2024

Choose a reason for hiding this comment

JeromeMartinez Mar 7, 2024

Choose a reason for hiding this comment

digitensions commented Mar 7, 2024 via email

retokromer Apr 9, 2024 • edited Loading

Choose a reason for hiding this comment

retokromer Apr 9, 2024

Choose a reason for hiding this comment

retokromer Apr 9, 2024

Choose a reason for hiding this comment

retokromer Apr 9, 2024

Choose a reason for hiding this comment

retokromer left a comment

Choose a reason for hiding this comment

retokromer Apr 9, 2024 •

edited

Loading