-
IEEE Transactions on Broadcasting Information for Readers and Authors
-
A Novel Low-PAPR Integrated Navigation-Communication Waveform Design for LEO Satellite Systems
Zhaoxian YangMiaoran PengYu ZhangXinkun ZhengJiaxi ZhouTao Jiang
Keywords:OFDMPeak to average power ratioSynchronizationSymbolsSatellite navigation systemsNavigationTransmittersReceiversLow earth orbit satellitesComputer architectureWaveformLow Earth OrbitLow Earth Orbit Satellite SystemComputational ComplexityTime DelaySignal PropagationDetection ProbabilityBit Error RatePower RatioBit ErrorDoppler ShiftPhase SequenceOrthogonal Frequency Division MultiplexingAcquisition PhasePseudo-random SequencePeak-to-average Power RatioOrthogonal Frequency Division Multiplexing SignalTime Delay EstimationNavigation FunctionOriginal SignalBit Error Rate PerformanceOrthogonal Frequency Division Multiplexing SymbolCoherent IntegrationGlobal Navigation Satellite SystemHigh Power AmplifierDoppler FrequencyDigital Signal ProcessingComplex MultiplicationDoppler Frequency ShiftComplementary Cumulative Distribution FunctionIntegrated navigation communication signalOFDMPAPRsatellite communication
Abstracts:Current orthogonal frequency-division multiplexing (OFDM)-based integrated navigation-communication (INC) designs suffer from critical limitations, particularly high peak-to-average power ratio (PAPR), which ultimately compromises the performance of both communication throughput and positioning accuracy. This paper proposes an acquisition-assisted low PAPR INC signal design scheme. Specifically, the transmitter utilizes the selectivity of the pseudorandom sequence designed for the navigation function in the frame header to indicate the index of the phase sequence that minimizes the PAPR of the OFDM signal, thereby avoiding the transmission of side information (SI) and achieving a reduction in PAPR. The receiver leverages the correlation properties of the designed synchronization sequence to jointly recover SI and estimate Doppler shift and time delay during the acquisition phase. The computational complexity of the proposed scheme is analyzed for both signal generation and SI recovery processes. The simulation results demonstrate that the proposed INC scheme achieves significant PAPR reduction without causing a degradation in the bit error rate (BER), maintains robust detection probability in low signal-to-noise ratio (SNR) environments, and attains high acquisition accuracy.
-
Adaptive Deep Joint Source-Channel Coding for One-to-Many Wireless Image Transmission
Lei LuoZiyang HeJunjie WuHongwei GuoCe Zhu
Keywords:Wireless communicationBandwidthTrainingImage codingImage communicationAdaptation modelsChannel codingSource codingImage reconstructionTime-varying channelsWirelessJoint Source-channelJoint Source-channel CodingDeep Joint Source-channel CodingComputational ComplexitySource CodeLow ComplexityChannel StateLatent SpaceLow Computational ComplexityTransmission PerformanceAdaptive ModulationChannel QualityMulti-scale Feature FusionBandwidth ResourcesRatio BandwidthConvolutional LayersInput ImageAdditive NoiseGlobal ContextChannel BandwidthForward Error CorrectionAverage Pooling OperationOrthogonal Frequency Division MultiplexingTraining StepChannel VectorPixel DomainReceiver SideReLU Activation LayerPeak Signal-to-noise RatioDeep joint source-channel codingbandwidth adaptabilityneighboring attentiongradient normalization
Abstracts:Deep learning based joint source-channel coding (DJSCC) has recently made significant progress and emerged as a potential solution for future wireless communication. However, there are still several crucial issues that necessitate further in-depth exploration to enhance the efficiency of DJSCC, such as channel quality adaptability, bandwidth adaptability, and the delicate balance between efficiency and complexity. This work proposes an adaptive deep joint source-channel coding scheme tailored for one-to-many wireless image transmission scenarios (ADMIT). First, to effectively improve transmission performance, neighboring attention is introduced as the backbone for the proposed ADMIT method. Second, a channel quality adaptive module (CQAM) is designed based on multi-scale feature fusion, which seamlessly adapts to fluctuating channel conditions across a wide range of channel signal-to-noise ratios (CSNRs). Third, to be precisely tailored to different bandwidth resources, the channel gained adaptive module (CGAM) dynamically adjusts the significance of individual channels within the latent space, which ensures seamless varying bandwidth accommodation with a single model through bandwidth adaptation and symbol completion. Additionally, to mitigate the imbalance of loss across multiple bandwidth ratios during the training process, the gradient normalization (GradNorm) based training strategy is leveraged to ensure adaptive balancing of loss decreasing. The extensive experimental results demonstrate that the proposed method significantly enhances transmission performance while maintaining relatively low computational complexity. The source codes are available at: https://github.com/llsurreal919/ADMIT.
-
MAIP: A Multi-Attribute Informativeness Proxy for Image Semantic Broadcasting Communication
Zhuo ZhangShuai XiaoGuipeng LanMeng XiJiabao WenJiachen Yang
Keywords:UncertaintyBroadcastingSemantic communicationDeep learningFeature extractionOptimizationActive learningRedundancyEntropyAdaptation modelsDeep LearningImaging DataCommunication SystemsActive LearningSemantic SystemProxy InformationBroadcasting SystemTraining SetRandom SamplingFeature SpaceImage ClassificationInflection PointSubset Of SamplesDiverse MethodsAblation ExperimentsDecision BoundaryConsistent SamplingRepresentation AbilityStage PerformanceCNN ModelSampling UncertaintySamples In The Feature SpaceDNN ModelGradient NormProxy For UncertaintyActive Learning MethodsConduct Ablation ExperimentsImage semantic communicationbroadcastingactive learningsamples information
Abstracts:In the image semantic broadcasting communication system, the resources of the channel are limited, which restricts the transmission and broadcasting of large-scale image data. This paper proposed a deep learning assisted image semantic broadcasting scheme to improve source efficiency and alleviate communication resource pressure at the transmission terminal. We adopt an image informativeness evaluation method to screen high information image data and implement this data-driven source optimization scheme. Specifically, we propose a Multi Attribute Information Proxy (MAIP) method that integrates fine-grained information attributes such as uncertainty, novelty, and diversity to evaluate and screen image semantic broadcast data. Used to support the formation of optimal image data broadcast transmission strategies. To demonstrate the effectiveness of the proposed MAIP, we compared it with state-of-the-art over three benchmarks CIFAR-10, mini ImageNet and Fashion Minst based on active learning as a validation experiment.
-
Adaptive Latitude-Aware and Importance-Activated Transform Coding for Learned Omnidirectional Image Compression
Hui HuYunhui ShiJin WangNam LingBaocai Yin
Keywords:Image codingTransformsConvolutional codesRedundancyTransform codingStandardsCodecsTransformersKernelEntropyImage CompressionOmnidirectional ImagesDistortionRectangularContextual InformationSpatial FeaturesFeed-forward NetworkConvolution KernelLatent RepresentationDilated ConvolutionCompression MethodLearning ImageAdaptive ModulationFeature RepresentationReceptive FieldLearning-based MethodsResidual BlockNonlinear TransformationChannel DimensionAdaptive SelectionCompact RepresentationEntropy ModelLeast Significant BitLatitude RegionsHyperpriorPanoramic ImagesTransformer-based MethodsSingle ConvolutionLoss Function OptimizationMultiple ConvolutionOmnidirectional image compressionequirectangular projection (ERP)latitude adaptive transform codingimportance activation
Abstracts:Based on the measured latitude and longitude, users can freely view different perspectives of the omnidirectional image. Typically, omnidirectional images are represented in the equirectangular projection (ERP) format. Although ERP images suffer from distortion and redundancy due to oversampling, making traditional codec inefficient, they maintain visual consistency and enhance compatibility with deep learning-based image processing tools. This has led to the emergence of end-to-end omnidirectional image compression methods based on the ERP format. In fact, transform coding, a key component in learned planar image compression, has not yet been fully explored in the domain of learned omnidirectional image compression. In this paper, we propose a transform coding method with adaptive latitude-aware and importance-activated features for omnidirectional image compression. Specifically, the adaptive latitude-aware mechanism comprises two modules. The first module, termed Adaptive Latitude-aware Module (ALAM), employs rectangular dilated convolutional kernels of multiple sizes to perceive distortion redundancy across different latitudes, followed by latitude-adaptive weighting to select optimal features for respective latitudes. The second module, named Multi-scale Convolutional Gated Feedforward Network (MCGFN), fully exploits local contextual information while suppressing feature redundancy induced by diverse dilated convolutions in the first module. Furthermore, to further reduce ERP redundancy, we design an importance-activated spatial feature transform module that regulates latent representations to allocate more bits to significant regions. Experimental results demonstrate that our proposed method outperforms existing VVC standards and learning-based omnidirectional image compression approaches at medium-to-high bitrates while maintaining low computational complexity.
-
LFIC-DRASC: Deep Light Field Image Compression Using Disentangled Representation and Asymmetrical Strip Convolution
Shiyu FengYun ZhangLinwei ZhuSam Kwong
Keywords:Image codingFeature extractionCorrelationConvolutionImage reconstructionData miningStripsSuperresolutionBit rateTransformsLight FieldImage CompressionDisentangled RepresentationLight Field ImagesDeep Image CompressionStrip ConvolutionLight Field Image CompressionSpatial InformationFeature SpaceConvolution OperationBitrateCoding EfficiencySpace CorrelationAngular InformationConvolutional Neural NetworkAttention MechanismReceptive FieldGenerative Adversarial NetworksVisual ComparisonConvolution KernelVariational AutoencoderSimple ConvolutionDecoding TimeLatent RepresentationNetwork CompressionResidual BlockVideo EncodingVideo CodingVertical OnesDictionary LearningDeep learninglight fieldimage compressiondisentangled representationasymmetrical strip convolution
Abstracts:Light-Field (LF) image is emerging 4D data of light rays that is capable of realistically presenting spatial and angular information of 3D scene. However, the large data volume of LF images becomes the most challenging issue in real-time processing, transmission, and storage. In this paper, we propose an end-to-end deep LF Image Compression method Using Disentangled Representation and Asymmetrical Strip Convolution (LFIC-DRASC) to improve coding efficiency. Firstly, we formulate the LF image compression problem as learning a disentangled LF representation network and an image encoding-decoding network. Secondly, we propose two novel feature extractors that leverage the structural prior of LF data by integrating features across different dimensions. Meanwhile, disentangled LF representation network is proposed to enhance the LF feature disentangling and decoupling. Thirdly, we propose the LFIC-DRASC for LF image compression, where two Asymmetrical Strip Convolution (ASC) operators, i.e., horizontal and vertical, are proposed to capture long-range correlation in LF feature space. These two ASC operators can be combined with the square convolution to further decouple LF features, which enhances the model’s ability in representing intricate spatial relationships. Experimental results demonstrate that the proposed LFIC-DRASC achieves an average of 20.5% bit rate reductions compared with the state-of-the-art methods. Source code and pre-trained models of LFIC-DRASC are available at https://github.com/SYSU-Video/LFIC-DRASC.
-
On Energy Replenishment Station Site Selection and Path Planning for Drone Video Streaming
Jian XiongJunqi WuYou ZhouShiqing Xu
Keywords:Autonomous aerial vehiclesBatteriesStreaming mediaPath planningHeuristic algorithmsBroadcastingDronesMultimedia communicationClustering algorithmsClassification algorithmsSite SelectionPath PlanningEnergy ReplenishmentEnergy SourcePathfindingAnt ColonyReal-time VideoEnergy CycleVideo TransmissionTime-of-flight Mass SpectrometryHigh AltitudeLocal OptimumSelection AlgorithmReward FunctionWireless Power TransferMarkov Decision ProcessPoint Of ConvergenceSum Of DistancesQ-learningAnt Colony OptimizationRapidly-exploring Random TreeAnt Colony Optimization AlgorithmSecurity SurveillanceFlight PathBorder LineState TransferFlight TrajectoryDemand PointsEnergy ConstraintsSet Of ElementsCombined AAVsborderline patrolenergy cyclic replenishmentenergy replenishment station site selectionpath planning
Abstracts:In recent years, with the advancement of autonomous aerial vehicles (AAV) technologies, small AAVs have been utilized for borderline patrol, especially for real-time video transmission without interruption. However, these small AAVs face limitations in conducting long-endurance and long-distance missions solely relying on their initial onboard resources. To address this issue, this paper introduces a novel combined AAV air resupply system based on energy cycle resupply. In this system, a ground energy resupply station dispatches a replenishing AAV (AAV-R) to dock with it along the border and transmit energy to the task AAV (AAV-T), when its energy resources are depleted, ensuring continuous energy supply. To tackle the challenge of siting the energy recharge station, we propose a greedy siting algorithm utilizing Monte Carlo methods and an algorithm based on ant colony and clustering. Simulations demonstrate that the number of energy recharge stations can be reduced to 47.6% - 52.9% compared to the AAV-T autonomous return recharge scheme. Additionally, we present a Q Learning-based energy cycle resupply algorithm for AAV-R path planning, offering practical applications in real-world borderline patrol scenarios.
-
Environment Information Enhanced Neural Adaptive Bitrate Video Streaming for Intercity Railway
Liuchang YangGuanghua LiuShuo LiJintang ZhaoTao Jiang
Keywords:Rail transportationStreaming mediaThroughputQuality of experienceBase stationsHandoverBit rateDoppler effectHeuristic algorithmsSchedulesAdaptive StreamingIntercity RailwayPrediction AccuracyDynamic EnvironmentBase StationNetwork StateQuality Of ExperienceTraining SpeedDeep Reinforcement LearningVideo QualityTidal ForcingNetwork LoadVideo TransmissionAdaptive AlgorithmLearning-based MethodsActor NetworkDoppler ShiftNetwork ThroughputResidual ComponentBuffer SizeSeasonal ComponentTrend ComponentVideo PlaybackRailway SystemPeriodic FluctuationsLSTM NetworkDownload TimeLink DistanceImpact SpeedImpact Of DistanceAdaptive video streamingenvironmental informationdeep reinforcement learningquality of experienceintercity railway
Abstracts:Intercity railways are vital to modern transportation systems, providing high-speed and efficient connections between cities. With the increasing demand for onboard entertainment and real-time monitoring systems, ensuring high Quality of Experience (QoE) video transmission has become a critical challenge. The unique characteristics of intercity railways, such as predictable railway schedules, spatial routes, and passenger-induced tidal effects, offer significant opportunities for optimizing video transmission performance. However, existing video streaming solutions must fully leverage these characteristics, resulting in inefficient bandwidth utilization, unstable video quality, and frequent interruptions caused by rapid train velocity, frequent handovers, and fluctuating network loads. This paper proposes an Environmental Information Enhanced adaptive video streaming (EIE-ABR) scheme that integrates environmental information with advanced techniques to address these challenges. Firstly, the scheme employs Deep Reinforcement Learning (DRL) to model the dynamic relationship between train speed and base station distance, enabling proactive bitrate adjustments in response to fluctuating network conditions. Secondly, EIE-ABR uses seasonal trend decomposition (STL) to capture throughput variations driven by periodic patterns, such as railway schedules and tidal effects, as well as abrupt disruptions from handovers or link failures. By combining DRL with STL, EIE-ABR achieves accurate throughput prediction and adapts effectively to the highly dynamic intercity railway environment. Simulation results show that EIE-ABR outperforms existing ABR algorithms, achieving an 11.22% improvement in average QoE reward.
-
Secure Video Quality Assessment Resisting Adversarial Attacks
Ao-Xiang ZhangYuan-Gen WangYu RanWeixuan TangQingxiao GuanChunsheng Yang
Keywords:Perturbation methodsSecurityQuality assessmentVideo recordingRobustnessData miningClosed boxTrainingVisualizationMultimedia communicationQuality AssessmentVideo QualityAdversarial AttacksVideo Quality AssessmentGeneral PrinciplesTemporal InformationVideo FramesSpatial SamplingAdversarial PerturbationsGeneral DefensePearson CorrelationComputational ComplexitySpatial InformationSemantic InformationDefense StrategyDiscrete SetHuman Visual SystemMotion InformationModel Quality AssessmentMean Opinion ScoreOriginal VideoDefensive EffectAttack MethodsVideo InformationSpatial-temporal InformationProjected Gradient DescentEntire FrameNo-reference video quality assessmentadversarial defenseclosed-box attackmodel security
Abstracts:The exponential surge in video traffic has intensified the imperative for Video Quality Assessment (VQA). Leveraging cutting-edge architectures, current VQA models have achieved human-comparable accuracy. However, recent studies have revealed the vulnerability of existing VQA models against adversarial attacks. To establish a reliable and practical assessment system, a secure VQA model capable of resisting such malicious attacks is urgently demanded. Unfortunately, no attempt has been made to explore this issue. This paper first attempts to investigate general adversarial defense principles, aiming to endow existing VQA models with security. Specifically, we first introduce random spatial grid sampling on the video frame for intra-frame defense. Then, we design pixel-wise randomization through a guardian map, globally neutralizing adversarial perturbations. Meanwhile, we extract temporal information from the video sequence as compensation for inter-frame defense. Building upon these principles, we present a novel VQA framework from a security-oriented perspective, termed SecureVQA. Extensive experiments indicate that SecureVQA sets a new benchmark in security while achieving competitive VQA performance compared with state-of-the-art models. Ablation studies delve deeper into analyzing the principles of SecureVQA, demonstrating their generalization and contributions to the security of leading VQA models. The code is available at https://github.com/GZHU-DVL/SecureVQA.
-
ADCMT: An Augmentation-Free Dynamic Contrastive Multi-Task Transformer for UGC-VQA
Hui LiKaibing ZhangJie LiXinbo GaoGuang Shi
Keywords:VideosQuality assessmentContrastive learningDistortionFeature extractionTransformersMultitaskingDegradationConvolutional neural networksUser-generated contentQuality AssessmentPerception Of QualityUser-generated ContentPrimary TaskSelf-supervised LearningVideo QualityAuxiliary TaskSpatial FeaturesLatent SpaceGated Recurrent UnitPooling OperationHuman Visual SystemLinear ProjectionNumber Of HeadsDiscrete Cosine TransformSpatial Feature ExtractionArtificial NoiseVideo DurationMulti-head Self-attentionVision TransformerMulti-task FrameworkObjective Assessment MethodsSupervisory SignalPre-trained ResNet-50Model Quality AssessmentPretext TaskMean Shift AlgorithmContrastive LossGlobal PoolingData AugmentationAugmentation-freemulti-task transformersupervised contrastive learninguser generated contentvideo quality assessment
Abstracts:Quantifying the quality of user-generated content (UGC) videos is particularly challenging due to the presence of complex multi-source distortions and the limited availability of annotated samples. Many current approaches to UGC video quality assessment (UGC-VQA) commonly surmount these dilemmas by applying distortion augmentation and contrastive learning strategies to enhance performance. However, the distribution of augmented samples deviates from raw UGC videos, resulting in limited improvement. In this paper, we proposed a novel Augmentation-free Dynamic Contrastive Multi-task Transformer (ADCMT) for UGC-VQA. Specifically, the primary task of quality score regression and another auxiliary task of feature recalibration are jointly addressed using a supervised contrastive learning multi-task transformer. The quality label space is partitioned into several subspaces to coarsely and dynamically guide the feature reconstruction in each mini-batch, enhancing its quality-awareness capabilities. This approach ensures that the distribution of embedded perceptual features aligns more closely with quality perception, effectively yielding fine-grained quality score regression. Thorough experiments carried out upon six publicly available UGC-VQA databases: KoNViD-1k, CVD2014, LIVE-Qualcomm, LIVE-VQC, YouTube-UGC, and LSVQ-Subset demonstrate that the proposed ADCMT shows significant performance improvement over other state-of-the-art competitors. The source code will be available at https://github.com/kbzhang0505/ADCMT.